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5.1. Additional simulation results. 

5.1.1. Nonparametric component estimates. In Step 7 of our estimation 
procedure we give both local linear and spline approaches to estimation the 
nonparametric component after the efficient estimator /3g is obtained. In 
this section we examine the finite sample performance via simulations. For 
comparison, we also computed the respective initial estimates, that is, the 
version using /3/ instead of /3f,. We considered the same settings in Section 
4, and we used cross-validation to choose the bandwidth used in the local 
linear estimation. We computed the mean integrated square error (MISE) 
for all the function estimates and took their average. The results are given 
in Table S.l. 

The figures in Table S.l indicate that it is clearly advantageous to update 
the nonparametric component after efficient estimation of the parametric 
component. In addition, we observe that the refine local linear and spline 
estimators perform roughly the same in terms of MISE. 

Table S.l 

MISE for simulation studies. 


Local linear estimate Spline estimate 


n=100 

Initial 

Refined 

Initial 

Refined 

p = A 

.0449 

.0354 

.0492 

.0376 

P = - 8 
n=200 

.0691 

.0597 

.0639 

.0593 

p= A 

.0390 

.0315 

.0415 

.0355 

P=- 8 

.0595 

.0589 

.0584 

.0576 


S.1.2. Parametric component estimates. We note that we adjusted the 
covariance function a(s, t) by setting all negative eigenvalues to be zero. We 
also considered a strictly positive threshold A l = 0.05 and set all eigenvalues 
lower than A l to be zero. The estimator using this covariance estimate is 
denoted by “Positive” in Table S.2. The “positive” estimator includes an 
adjustment when estimating the covariance function by setting eigenvalues 
lower than a positive cut-off to be zero while the efficient estimator only 
adjusts the negative eigenvalues. Therefore, it is slightly more biased than 
the efficient estimator. In all the considered cases, the crude and positive 
estimators are still more efficient than the working independence estimator. 
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Recall that in all the numerical analysis reported in the paper, h\ and /12 
were selected via the commonly used leave-one-subject-out cross-validation, 
and the bandwidth h 3 used in the estimation of the covariance structure 
were selected as /13 = 2/q. To examine effects of the bandwidth choice, we 
considered various choices of h 3 in the numerical studies and obtained quite 
similar results. Under the column “Different / 13 ”, we report the results for 
another case when /13 = 1.5/ti, which are similar to those obtained when 
/13 = 2 h 1 . 

Our procedure does not require any iteration. In practice it may be inter¬ 
esting to refine the estimation of coefficients and covariances using iterations 
and obtain a final estimation upon convergence. We report the numerical 
results under the “Iterative” column. The bias and SE are very close to those 
obtained without iteration. 


Table S.2 

Estimation results of 200 simulations. “Positive” means we set a positive threshold for 
the covariance eigenvalues; “Different /13 ” means using a different choice of /13 in our 
efficient estimation; “Iterative” indicates an iterative estimation approach. 


n 

p 


Positive 
bias SE 

Different h 3 
bias SE 

Iterative 
bias SE 

100 

0.4 

01 

.0173 

.0411 

-.0152 

.0375 

-.0146 

.0361 



02 

.0176 

.0423 

-.0098 

.0375 

-.0095 

.0352 




.0205 

.0425 

-.0122 

.0369 

-.0099 

.0360 



04 

-.0096 

.0425 

.0098 

.0373 

-.0086 

.0362 

200 

0.4 

01 

-.0113 

.0329 

.0056 

.0274 

.0045 

.0228 



02 

-.0164 

.0334 

-.0099 

.0274 

-.0066 

.0219 



(3 3 

.0120 

.0323 

.0072 

.0273 

.0034 

.0259 



04 

-.0095 

.0329 

-.0043 

.0276 

-.0035 

.0274 

100 

0.8 

01 

.0202 

.0366 

.0082 

.0336 

.0065 

.0325 



02 

.0163 

.0378 

-.0075 

.0335 

-.0034 

.0323 



(33 

.0197 

.0372 

.0166 

.0337 

.0121 

.0328 



04 

-.0168 

.0354 

-.0182 

.0338 

.0157 

.0325 

200 

0.8 

01 

-.0044 

.0214 

-.0124 

.0202 

.0056 

.0199 



02 

.0036 

.0215 

.0138 

.0200 

-.0049 

.0199 



03 

.0042 

.0215 

.0165 

.0204 

.0052 

.0178 



04 

-.0038 

.0214 

-.0148 

.0200 

-.0050 

.0179 


S.2. Proofs of Propositions 1-3 and Lemma 1. In this section, we 
outline the proofs of Propositions 1-3 and present the proof of Lemma 1. 
When nrii is uniformly bounded, we have the same results for general link 
functions by just following closely the arguments of [3]. We outline the results 
at the end of this supplement. Note that the sub-Gaussian error assumption 
is necessary in that case. We outline the proofs of Propositions 1-3 since we 
allow some of the rrii s to diverge as in Assumptions A 1 and A 2 . 
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Proof of Proposition 1. First we consider the properties of Tv- The {k, Z)th 
element of 2 is given by 

(X k -Z T $ Vk ,X l -Z T Cp vl )l. 

From Lemma 1 (v)-(vii), we have 

(X k - Z T ip Vk ,X l - Z T Cp vl )l = {X k - Z T <p* Vk ,X l - z Vvz)n + Op( 1) 

= (X k - Z T <p* Vk ,X t - Z T v\n) V + o p ( 1). 

This and (2.5) imply that for some positive constants C\ and C 2 , we have 

Ci — ^min (n ^n.a) < A max (n 1 H\\. 2 ) < C 2 

and hence 

(s.l) 4r < Amin (H 11 ) < A max (iL n ) < 4r 

nC 2 nCi 

with probability tending to 1. Note that 

Var(/3v | {X^}, { z ij}-, {Tij}) = ^v 

and Theorem 1 of [13] implies that Tv — H 11 is nonnegative definite when 
H 11 is defined with V) = Sj. Hence for some positive constant C 3 , we have 



n 


with probability tending to 1. 

Now we prove the asymptotic normality of 

Pv ~ E{/3v I {Xij}, {Zij}, {Tij}} 

n n 

= H " (E x I v ,~ l u ~ H ^ H 2t E W t W'&) ■ 

i=l i=l 

As in the proof of Theorem 2 of [13], we take c € M p such that |c| = 1 and 
write 

n 

c T (Pv - E{/3v | {Xij}, {Zij}, {Tij}}) = ^2 a Pli (say), 

2—1 

where 

af = c T H n (X, - W t Hf 2 H 2 i) T Vr l V t Vr 1 (A, - W,H^H 2 X )H U c 
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and {r]i} is a sequence of conditionally independent random variables with 
E {rjt | {Xij}, {Zij}, {Tij}} = 0 and Var^ | {X^}, {T^}) = 1 . 

We have from (S.l) and Lemma 1 (vii) that 


max af = O v 

Ki<n y 


m~ 


n* 


J2\\X k ~Z T ip Vk \\ 2 00 )=O f 


k =1 


m: 


n* 


On the other hand, we have for some positive constant C 4 , 

C 4 


= c T r v c > 


i— 1 


n 


with probability tending to 1. Hence we have established 

= lm ^ 


E n 

i= i«; 

and it follows from the standard argument that 

n _l/2 n 

(S.2) (5>?) ^ai^NfOd). 

Z=1 Z=1 

Finally we evaluate the conditional bias: 

Bias/j = E{/3 V | {X lJ }, {Z tJ }, {T tJ } } - /3 0 
Take § € Gb such that ||gfo — 9\\g,oo = 0(K~ 2 ) and set 
$o = 9o ~ 9 and 5 0 = Z T S 0 . 

Note that 

ll^olloo = 0(K~ 2 ) and \\S 0 \\ V = 0(K~ 2 ). 

We also take <fvk €E Gg such that || <Py k — <Pvk\\G,oo = 0(K~ 2 ). Then we 
have the following expression for the conditional bias: 

Bias^ = nH 11 (S\ ,..., <S P ) T , 


= (X k , 5 0 - Z T rWo>^ = (X k - Z T <p Vk , 5o - 2T T IWo>^ 
= (X fc - Z T ip* Vk ,5 0 - z T n v „5 0 )^ 

+ <x fc - zV™, z T n Vn <5 0 - z T n Vn 5 0 )V 
+ (Z T <py k - Z T ip Vk , Jo - z T n Vn 5 0 )V 
= 5ifc + S 2 k + S^ k (say). 


where 
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Note that 


E{S lfc } = 0 and E{S 2 k } = 0 ^ Xk 


V\2 


since S\k is a sum of independent random variables, <Pvk = So = 

Z t Sq, and 


||5 0 - Z T U Vn S 0 \\ oo < ||<5o||oo + CK^ 2 \\Z T U Vn S 0 \\ v 

KWSoWoo + CK^WSof = 0(K~ 3 / 2 ). 


Hence we have 

Sik = O p (l/{nKl)V 2 ) = o p (n~ 1 / 2 ). 

Now we deal with S 2k - From Lemma 1 (vi) and the fact that ||<5o— Z T HvnSo\\oo 
0(K n i / 2 ), we have 

\\Z T U Vn S 0 - Z T U Vn S 0 \\l 

|( 5 0 - Z T n Vn S 0 , Z T g)l - (< 5 0 - Z T U Vn S 0 , Z T g) v \ 

s S?» l|Z T Sllt 

= O p {l<-^^) = O p (K-' n-‘/ 2 ). 

Thus we have 

\S 2 k\ = o p (n~ 1/2 ). 

We also have 

\S 3k \ < INir II Z T (<Pv k - £v*)lln = O p (K~ 4 ) = Opin- 1 / 2 ) 

since ||<5o — Z T Hv n So\\n < ll^olln • Hence we have 

Bias^ = Op(n“ 1 ^ 2 ). 

The desired result follows from (S.2) and the above equality. 


As for Proposition 2 , there is almost no change in calculation of the score 
functions in [13] and [4] and we omit the outline. This is because m, is 
bounded for any fixed n. 


Proof of Proposition 3. When Vi = S*, we have 

T v = H ll = {H ll . 2 )~ l and vhk = vlff,k- 
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Lemma 1 (vii) implies that 

-r ^ 1 = -ffn .2 = -E + o p ( 1) = S2 s + o p (l). 

n to 

The desired result follows from the above result and Proposition 1. 

Proof of Lemma 1. The proof consists of seven parts. 

(i) Recall that 

i n 

(II Z T g\\ V f = {YXZ T g) Tv -\Z T g)). 

2—1 

We have from Assumptions A4 and A5 that 

y'-Y 22 -« 222 i 

(5.3) -i E { 

2—1 1 j= 1 

. 22 272^ 

< (ll^D 2 < — e{ 

^ ■ M 1 ^ 

2=1 J = 1 

for some positive constants Ci and C* 2 . Assumptions A 2 and A3 imply that 
for some positive constants C 3 and C 4 , 

(5.4) 

q n rrn 

C 3 ]T / gf(t)dt < - e { x: — E 9 T {T ij )Z lj Zj j g{T ij )] 

l=\ J i =1 * i = l 

i r n 9 /• 

£ - e {EE9 t < t «) z « z « 9( t «)} S C *E sfw*- 

i=l j=l 1=1 J 

The desired result follows from (S.3) and (S.4). 

(ii) This is a well-known result in the literature of spline regression. See for 
example A.2 of [12]. 

(iii) The result in (ii) implies 

\\X T f3 + Z T g\\l < CK n (|/3| 2 + \\g\\ 2 G , 2 ) 

for some positive constant C. Recall that p and q are fixed in this paper. 
On the other hand, we have from Assumptions Al-3 and A5 that for some 
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positive constants C\. C 2 , and C 3 , 
(|| X T (3 + Z T g\\ v f 



Besides, we have for some positive constants C\ and C 2 , 

s-y 71 772. j 

OMl' ) 2 < ~^J2J2\ V ij\ 2 < CfelMloo- 

i=l j=1 

Hence the desired results are established. 

(iv) For gi £ Gb and 92 G Gb, we have 

1 n 

(Z T g 1 , Z T g 2 )n =7if { - X] WjV- l W^ l2 = 7i T A V nl 2 (say), 


where A v n is a qK n x qK n matrix and 71 and 72 correspond to g\ and g 2 , 
respectively. Elements of ^ )T) ? re = , WjV ~ 1 \V t are written as 

1 n 

( s - 5 ) ( Say ), 

*=1 31,32 

where u ;- 1 ' 72 is dehned in (??), 1 < k\,k 2 < K n , and 1 < l\,h < g. By 
evaluating the variance of (S.5) and using the Bernstein inequality for inde¬ 
pendent bounded random variables, and Assumptions A1 and A2, we have 
uniformly in k\, /c 2 , Zi, and I 2 , 


(S. 6 ) 

-A-(ki,li,k2,h) T7,/-T-(fci,h,fe2,i2)', ^ ( 

Ay n _ E(A Vn ) — 

/ log n \ 

VnK*) 

if B kl (t)B k2 (t) = 0 

and 




(S.7) 

-r-(ki,h,k2,h) T?/"A"( fc i>h,fc2^2)\ ^ / 

Avn “ E(A Vn ) — ^ 

/logn\ 

/ nKj 

if B kl (t)B k2 (t) # 0. 


By exploiting (S. 6 ), (S.7), and the local property of the B-spline basis, we 

obtain 

(S. 8 ) 

max{|A m in(Avn — E(Ay n ))|, |A max (Avn — E(Avn))|} = 








We also have 


(S.9) -± < A min (E(A V n)) < A max (E(A Vri )) < 

■K-n 

since Assumptions A 2 and A3 yields 



s~i Tl 1Tli 

< Avn < — ® B(Tij)) T i Z ij ® B(Tij )) 

n i=lj=l 

for some positive constants C 3 and C 4 . See the proof of Lemma A.3 of [12]. 
Hence the desired result follows from (S. 8 ) and (S.9). 

(v) This follows from (iv) and (vi). 

(vi) Using Assumptions A1 and A2 we have 

1 n 

(5 n , ZiB k )V = — EE 3n,iji vf j Zij 2 iB k (Tij 2 ) 

Tl. 

»=1 31,32 


and 


Var(($ n ,Z,B fc >£)< 


Ci 11*5, 


n 


|2 n 

XX E E {Bl(T in )Bl(T lj2 )} < 

*= 1 31,32 


C2W6.. 


n Moo 




for some positive constants C\ and C 2 . Hence we have 


Q Kn 


c, 


EE V a r((6 n ,ZiB k )V)< -H5, 




1=1 k =1 


for some positive constant C and the desired result follows from (S.9). 

(vii) Take (pvk € Gb such that \\ipvk ~ ^PvkWGiOo = 0(K~ 2 ). Then we have 
for some positive C, 

(S.10) || Z T (ip Vk - <Pv k )\\oo 

^ \\Z T {v Vk - ¥Vfc)||oo + II Z T (ipvk - Vvk) lloo 

< Cs/k^\\Z t ( lp Vk - <f Vk )\\ v + II z T {y Vk - <p * Vk )IU 

< c^T n \\z T {vv k - vvk )ir + \\z T (<p Vk - <p* Vk ) Hoc 
= 0(K- :i /‘ 2 ). 
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Here we used the fact that Tp Vk = H Vn X k € Gb and <py k = n v z k- In¬ 
equality (S.10) implies \\Z T Tp Vk \\ QO = 0(1) and we have only to evaluate 
Z T (Tp Vk — Cpvk)- We should just follow the arguments on p.16 of [3] by re¬ 
placing ip k n and (pk,n with Z T Tp Vk and Z T (pvk since the arguments employ 
(iv) and (vi) and don’t depend on m*. Then we have 

\\ ZT (jfVk - fivk) Hoc = 0 P ( 1), II z 1 ( Tp vk - fivk)\\n = Op(y/K n /n), 

and \\Z T (Tp Vk - <p V k)\\ l = O p (y/K n /n). 

The desired results follow from the above equations and (S.10). 


S.3. Proof of Proposition 4. In the proof, we repeatedly use argu¬ 
ments based on exponential inequalities, truncation, and division of regions 
into small rectangles to prove uniform convergence results as in [S3]. We do 
not give the details of these arguments since they are standard ones in non- 
parametric kernel methods. Since we impose Assumption A2 and we do not 
use Ej or V) in the construction of g(t), cr 2 (t), and a(s, t), we see the effects 
of diverging rrii explicitly only when applying the exponential inequality 
for generalized U-statistics. Recall that we assume three times continuous 
differentiability of the relevant functions in this proposition. 

The proof consists of four parts: (i) representation of g(t), (ii) represen¬ 
tation of €ij, (iii) representation of cr 2 (t), and (iv) representation of a(s,t). 
(i) Representation of g(t). Applying the third order Taylor series expansion 
to go(t), we have 

(5.11) 

Zfjgo{Tij) = Z?{g 0 (t) + h i T ‘ J hi f 9o(t) + -j( T ' J hl So(*)} + O(hf), 

where g' 0 (t) = (g' 01 (t ),... ,g' 0q (t)) T and g£(t) = (g^t), ... ,g% q (t)) T . By 
plugging (S.ll) into (3.2), we have uniformly in t, 

(5.12) g(t) = g 0 (t ) + T>q(Li(t)) _1 L 2 (i)(/3o - Pi) 

+ ^DqCLpt))- 1 !^)^) + DgiLiit))- 1 Eo(t) + O p {h \), 
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where L\(t) = A\ n (t) defined after (3.2), 

n rrii / 


1=1 j =1 
-j n rrii 


rLt 1 K 


hi 


Tjj t 
hi 


Nihi - 

1 1 j=i j=i 

1 n rrii 


K hi ) 1 zr 
3 1 A 
y hi > J 


X T 

iji 




Eo(t) - 


i=l 1=1 


ij I 

hi 


A' 


Ty t 

hi 


-ij- 


By following standard arguments such as those in [S3], we obtain for j = 
1,2,3, 


(S.13) 


Lj{t) — Lj(t ) + Op 


logn 

nh\ 


uniformly in t, 


where Lj = E {Lj(t)}, and 

(5.14) £o(i) = Op(^) uniformly in t. 

Assumption A 2 implies that 

(5.15) Cll2q < Al(t) < 0 2 I 2q 

for some positive constants Oi and 0 2 . From (S.12)-(S.15), we have uni¬ 
formly in t, 

(S-16) 

0(t) = g 0 (t) + D q (Li(t))~ 1 L 2 (t)(p 0 - 3/) + ^-A»,j(Li(t)) _ 1 L 3 (t)gfo(t) 

+ + o p (hf ) + Op(^) + ) 

= 5o(i) + A 4 (t)(/3 0 ~ &) + h i L 5(t)9o{t) + L 6 (t)E 0 (t) 



Note that all the elements of Lj(t), j = 4,5, 6 , are bounded functions of t. 
(ii) Representation of eR . We have 

e ij = e ij + -^Qj(A) — f3j) + Zf^goiTij) — g(Tij)). 
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By plugging (S.16) into the above equality, we obtain uniformly in i and j, 

(5.17) 

% = €ij + (Xg - ZgL 4 (T ij ))( / ao - A) ~ hiZfj L 5 (Tij)g"(Tij ) 

- Zfj L 6 (Tij ) Eq(T ij) + O p (h\) + O p (^) + 0 P (hl^) 
= etj + MV(Po - fr) + h 2 1 M% ) g"(T ij ) + m£ } E 0 (Tij) 

Note that all the elements of and M-p are uniformly bounded 

functions of X t j , Zij, and Tj . 

(iii) Representation of cr 2 (t). We have uniformly in i and j, 

(5.18) (%) 2 = eg - a 2 (T tj ) + a 2 {T ij ) + 2e ij M§ ) E 0 (T ij ) 

+ 2eijM^(p 0 - fr) + 2e ij h 2 M^g”(T ij ) 

Recall that M®, l = 1,2,3, are defined in (S.17). It is easy to see that the 
contributions of 2eijM-j\/3o — (3i) and 2tijh 2 ib/g 2 ' 1 g"{Tjj ) to a 2 (t ) are 

uniformly in t. respectively. Thus we have only to consider eg — a 2 (Tij), 
a 2 (Tij), and 2e ij M§ ) E 0 (T i:j ) in (S.18). 

Setting L 7 (t) = A 2n (t), which is defined after (3.3), we have for some 
positive constants C\ and C 2 , 

(5.19) L 7 (t) = L 7 (t) + O p [^^) and C X I 2 < L 7 (t) < C 2 I 2 

uniformly in t, where L 7 (t) = E{Ly(t)}. Now we have uniformly in f, 

(5.20) a2(i) = (1 0)(L 7 (t))~ 1 (Ei(t) + Biasi(t) + R\(t)) 
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where E\{t) is defined in Proposition 4, Biasi(f) is the term of <r 2 (Tjj), and 

/ q\ 

Ri(t) is the term of 2e^-Mh E^iTij). It is easy to see that uniformly in t, 


(S.21) 




By applying the Taylor series expansion, we have 


a\T ij ) = a\t)+h 2 {a*)'{t) 


/ /,n Tij t h\ 


h 2 ' 2 

Therefore Biasi(t) can be represented as 




Tij t\ 2 


h 2 


+ 0 (h$). 


Biasi(t) = L 7 (t) ( U _ (J2 ) + 

+ O p (hl 


a\t) \ , hl(aY(t)^^((^f 


mow 1 2«i/i2 


A' 


h 2 


uniformly in t. Setting 


£s(*) = 


n m i ((Tij-t^ 2 \ 

V /i 2 2 * 
Tj — 1\3 


EE 


N\h 2 ^ ^ \ (BLJ.) 

1 z j=l 1 = 1 \V /), 2 / 


A 


Tjj t 

h 2 


we have uniformly in t, 


L$ (t) — L%(t) + 0 ? 


logn 


nh 2 7’ 


where L$(t) = E{Lg(i)} and Lg(i) is a bounded vector function of t. Hence 
we have uniformly in t, 

M° 2 nt) 


Biasi(t) = L 7 (t ) 


-Lg(t) + O p (h 2 ) + Op ( h 2 


log n 


nh 2 


Next we deal with R±(t), which can be written as 
(S.23) 

n rrii n 


i 


N\h x h 2 


y~i yy yy yy yy ^j^j'^ab^Bab^j/Ka 


— t\ „ /li'i' — T, 


ho 


Kb 


hi 


a,b i=l j=li'=lj'=l 

where A;(t) = t l K{t), a = 0,1, and 6 = 0,1. Note that and B a bjj are 
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uniformly bounded functions of X Vj . , and T VJ . We evaluate 

(S.24) 


1 


n rrii n wty 


yy yy e yy ? AabwBaby ? Ka 


h\h ,2 ^ ^ 

1 1 z i=l 7 = 1 j' = l 7'=1 


Tjj — t\ (Tvi' — Tj 


J = 1 2' = 1 j> 

n rrii 


hi 


K b 


hi 


E E ^A ab)lJ B ab , l3 K a (lk-l^K b (0) 


-N 2 h\h >2 4-f “ 

1 7=1 7 = 1 


+ 


+ 


^ 1 ^ 1/12 “ “* 
* =1 Iff? 


E E A ab ijB ab ij' K a 


T VJ -1\ _ (Tui — Tj 


K b 


L y 


7 Y 2 /j /1 y y e ij e i'j'Aab,ijBgb j iij’K a 
1 j,j' 


Ta — t\ „ /Tj/j' — Tj, 


A'; 


= M 2 ( 0 + 42 , (*)+M 2 . (*) ( sa y) ■ 


Note that we cannot apply classical exponential inequalities for U-statistics 
since kernel functions depend on i and i' and observations are not identical. 
It is easy to see that uniformly in t, 


(S.25) Ma^) = O p ((nh 1 ) x ) and M2(*) = °p( n E 


(3) 

We evaluate B\ ob (t) by using an exponential inequality as the one given in 
(3.5) of [SI] with A = Ci(logn) fc m^ ax /(n 2 hi/i 2 ), 


B 2 = c ( lo 8 n ) 


2 k 2 
11 ‘'max 


n 3 hi/i 2 


(hi 


1 + h 2 1 ) 


C = C%/(nh^ 2 ^ 2 ), and x = M\ogn/{nh}^ 2 h^ 2 ) in the inequality and 
standard arguments in nonparametric regression as in [S3]. Note that we 
used a kind of truncation technique to handle e t j and that we have to take 
sufficiently large k and M here. Hence we have 


R 


(3) 

lab 



log n \ 
n(/ii/i 2 ) 1 / 2 / 


The above equation and (S.23)-(S.25) imply that 


(S.26) 


Ri(t) = O p ( 


log n \ 
n(hih, 2) 1 / 2 / 
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uniformly in t. It follows from (S.19)-(S.22) and (S.26) that 


hi 


alt) = alt) + (lOXLrtt))- 1 ^) + -f(10)(L 7 (t))^L 8 (t)(a 2 )"(t) 


+ Op(hf) + O p (h 3 2 ) + O p (^) + O p (^| 


n 
nh 2 


The expression of a 2 (t) in Proposition 4 also follows from the above expres¬ 
sion. 

(iv) Representation of a(s,t). We can proceed almost in the same way as 
when we deal with a 2 (t). First we have uniformly in i, j, and j ', 


® ( Tj j • T/ j t J T a ( Tj j , Tj ji j 
+ eijM^EoiTij,) + e ir M^E 0 (T i:j ) 

(5.27) + e.ijM-j)(fa - pj) + e if M^ (/3 0 - fa) 

(5.28) + g'lT ir ) + e ir h\M^f g"(T^) 

+ 0p ^ + o^),o r {nl^). 

It is easy to see that the contributions of (S.27) and (S.28) to a(s,t) are 



uniformly in s and t, respectively. Therefore we have only to consider €ij€iji — 
a(Tij,Tij/), a(Tij,T i: ji), and eijM^)E 0 {T i:j ,) + EgiT^) in%%/. 

Setting Lg(s,t) = A^ n (s,t), which is defined after (3.4), we have for some 
positive constants C\ and C 2 , 


(S.29) L 9 (s,t) 



and C\I 2 < Lg(s,t) < C 2 I^ 


uniformly in s and t, where Lg(s,t) = E{Lg(s,f)}. Now we have uniformly 
in s and t, 


(S.30) (j(s,f) = (1 00)(Lg(s,t)) 1 (E 2 (s,t) + Bias 2 (s,£) + R 2 (s,t)) 

+ o^ +0 l^) +o m 


nh\ 
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where A 2 (s, t) is defined in Proposition 4, Bias 2 (s, t) is the term of cr(T,j, Ay), 

and A 2 (s,t) is the term of e_ijElyE q (Ay ) + e^-/ M-J^Eq (Ay). It is easy to 
see that uniformly in s and t, 

(S.31) 




Setting 

L w (s,t) 

1 


EE 


( i \ 


N 2 h 2 —, 

d 1=1 3+3 


T$-t 

Hr/ 


Tij -s\ 2 2 (Tij - s)(Tiji - t) (Tij, -t\ 2 


/i 3 


/is 


h 3 


x A" 


Tij - s\ „/A.,' - f 


/is 


A" 


L *r 


/l3 


we have uniformly in s and t, 


Lio(s,t) = L w (s,t ) + O; 


/logn 


n\/ nh| A 


where Aio(s,i) = E{Lio(s, t)} which is a bounded matrix function of (s,t). 
Then we have, as in the proof of the representation of (j 2 (f), uniformly in s 
and t 


a(s,t) 


(S.32) Bias 2 (M) = L 9 (s,t) I h 3 ^(s,t) I + '-^-L w (s,t) I jg§j(s,t) 


h\ 


d 2 c 

w 


(s,t) 




d at< 

dt 2 


9 a <s,t) 


+ O p {hl) + O p [hl^^). 


Finally we deal with A 2 (s, t) in the same way as in the proof of the repre¬ 
sentation of c 2 (t). We use the same exponential inequality for U-statistics. 
We should consider 


(S.33) 


1 


N N h h 2 E^ E1 E1 'E, e iljl e i 233 Agbc,i 1 j 2 Bgbc,i 2 33 

3 U = 1 *2 = 1 jl+32 33 


X K n 


T . —T 

-‘-1233 J 'H32 


K h 


Aiji /Aiio S 


K, 


L * 1 J 2 
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where A)(t) = t l K(t), a = 0,1, b = 0,1, and c = 0,1. Note that ^4 a fec,ij 
and B a b c jj are uniformly bounded functions of X,j , Z,j. and T{j. This is a 
generalized U-statistics when we remove the summands of i\ = i 2 and we 
recall (1.1) when we evaluate (S.33). It is easy to see that uniformly in s 
and t, 


(S.34) 


^^ e iiji e iij3^-abc,iij2-^abc,iij3 


N\N 2 h\h% - 

3 U =1 jl^j2 33 
V K ( Tiljs ~ Tilj2 \ K L ( Tiljl ~ K ( Tilj2 ~ S 

X M h[ ) Kb \—h^) Kc \ h 3 

(o\ 

In the same way as when dealing with B.\ ab (t), we obtain 


= O r 


nhi 


(S.35) 

N N h h 2 ^ e h h e hj 3 ^abc, ii.72 Bg.bc,1233 

5 11^2 313^32 J3 


X /C, 


T —T- 

± 1-233 11 


H3 2 


>11 


^6 


h3 


AT 


A)iji2 s 


/l3 



log n \ 

n/i^ 2 /i 3 ^ 


with A = Ci(log n) k m'l nax /(n 2 hih 2 ), B = C 2 (logn) fc m^ ax /(n 3 / 2 h} / 2 /i2), 

1 /o 1 /o 

C = C 3 /(nh 1 ' / 13 ), and x = Mlogn/(n/i 1 / / 13 ) in the exponential inequal¬ 
ity. Note that we should choose sufficiently large k and M. It follows from 
(S.34) and (S.35) that uniformly in s and t, 


(S.36) 


R 2 {s,t) 


O p 


log n \ 

nh L / 2 h 3 ' 


Note that we cannot relax the assumption of m max = 0(n 1//8 ) in Assumption 
A 1 when we derive (S.36). It follows from (S.29)- (S.32) and (S.36) that 
uniformly in s and t, 


a(s, t ) — a(s, t ) 

= (100)(L 9 (s,t))~ 1 E 2 {s,t) + -^-(10 0)(L 9 (s, t)) _ 1 Aio(s, t) 



d 2 f{s,t) 

JpM) 
!#M). 


The expression given in Proposition 4 follows from the above expression. 
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S.4. Proofs of Lemmas 2-8. First we state some results on Xj. Set 


( , 37) 

Then we have from Proposition 4 that uniformly in i. 

max{|A m in(Xj Ej)|, l'^max(Ej Ej)|} — Op^TTlidn). 


Recall that 



Xr^X, - X^E" 1 

Xr^Si - S,)xr' + X“ 1 (Xj - SOEr^Xi - E^Sr 1 . 


We have from Assumption A4 and Proposition 4 that uniformly in i, 

(5.38) IS-^X; - XOE^Ima* = 0>A), 

(5.39) lET^Xi - EOXr^Xi - S^Xr^ax = O p {m% 2 ), 

where |A| max = maxjj | a l3 for any matrix A = (a t j)- Besides, it follows from 
Assumption A4 that we have uniformly in i. 

(5.40) 

max{|A min (X” 1 (X- E^Er 1 )!, ^(Xt^X, - E^Er 1 )!} = O p (rmS n ). 

We also have the same result for X~ 1 (Xj —Xj)X“ 1 (Xj —X^X ” 1 as in (S.40) 
with rrii5 n replaced by (mj<f n ) 2 . Proposition 4 also implies each element of 
X~ 1 (X, i — X^X ” 1 has the form of 

(5.41) 

mi 

i =1 iAi' 


where 


n (5) = m? o, 


,(h\ + h\ 


+ hn + 


log n log n log n 


nh-i 


+ 


nh 2 


+ 


nh | 


uniformly in i. 

We state the following two useful facts before we start proving Lemmas 
2-8, both hold uniformly in l: 


-i n mi 

=°p(Kn 1 ), 

H i= 1 j =1 


(S.42) 
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n mi mi 

(5.43) and - E m ? E E \W in i\\e in \ = O^R - 1 ), 

i=1 il lj2-l 

where denotes the Zth element of Wjj. We can prove them in the same 
way, except that we need a kind of truncation argument when showing 

(5.43) , and we outline the proof of (S.42) in the following. To prove (S.42), 
we evaluate the expectation and variance and apply the Bernstein inequality. 
First note that we have uniformly in l, 

n mi 

E^- 1 |W^} = OtiF- 1 ). 

i =1 j= 1 

This follows from the local property of the B-spline basis and Assumption 
A2. In addition, since we have from Assumption A2 that 


2 n rrii 

i=i j =i 

2 1 
TY) 777' \ 

max . max \ 

nK n nKl ) ’ 



m 


3 

max 


Tl 


2 


E m * E \ W mi\\W ij2 i\} 

i= 1 ii/j2 


the variance is bounded from above by Cun” 19 / 20 uniformly in l. Each sum¬ 
mand is bounded from above by C^rn^^/n = 0(n -1 / 2 ). Hence (S.42) and 
the uniformity in l follow from the Bernstein inequality. 


Proof of Lemma 2. We can verify the result on n~ 1 hu,ki by using the local 
property of the B-spline basis and the Bernstein inequality for independent 
bounded random variables. Since 

1 _ 1 n ^ 

—(H\ 2 - h 12 ) = - 

i— 1 

1 n 

+ ~ - SOEr 1 }^, 

Tl 

1=1 

the desired result on n~ 1 (hu,kl — hu,kl) follows from (S.38), (S.39), and 
(S.42). The results on the Euclidean norm follow from those on the ele¬ 
ments. Hence the proof is complete. 

Proof of Lemma 3. We have from Assumption A4 that 

(s. 44 ) ^ e —wjm < -H22 < — E wlm 

n rrij n n 

A — 1 L A — 1 
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for some positive constants C\ and C 2 and for k = 0,1, 



Thus the first result follows from Assumptions A2 and A3 and the standard 
arguments on B-spline bases as in the proofs of Lemmas A.l and A.2 of [12]. 
Since we have 
1 — 1 n 

-(H22 - H 22 ) = - sosr 1 }!^ 

2=1 

1 U 

+ - E^^Wo^ - - sos- 1 }^, 

Ti . 

2=1 

the second result follows from (S.40), the inequalities similar to (S.44), and 
Assumptions A2 and A3. The third result follows from the first and second 
results. Finally we deal with the fourth result. Note that 

(S.45) (n _1 iT 22 ) _1 - {n- 1 H 22 )~ l 

= (n -1 H 22 )“ 1 (n -1 H 22 - n~ 1 H 22 )(n~ 1 H 22 )~ 1 

+(n~ 1 H 22 )- 1 (n- 1 H 22 - n~ l H 22 ) 

x(n- l H 22 )-\n- 1 H 22 -n- l H 22 )(n- l H 22 )- 1 . 

By using the first, second, and third results and (S.45), we obtain the fourth 
one. Hence the proof is complete. 

Proof of Lemma f. The first result follows from (S.40). The second one 
follows from Lemmas 2 and 3. The last one follows from the first two. 


Proof of Lemma 5. The first result follows from the fact 


C 1 


n 1 n ^ n 

E —wlm < - < = ^rwjm 

r - ' rrn n t—■' n ' 


n m, .. ... 

2=1 1=1 2=1 

for some positive constants C\ and C 2 . Next note that 
1 n 

(S.46) -= E Wf - SOS," 1 } e, 

* 2=1 

1 


==E^i’( E r‘(S,-E i )E- 1 }e j 

2=1 

I U 

2=1 
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By employing (S.39) and (S.43), we can prove the stochastic order of the ele¬ 
ments of the second term of the right-hand side is uniformly O p (^nK ~ 1 (h^-1 - 
hf+log n/(nh 2 ) +\og n/(nh\))). Thus the norm of this (//^-dimensional vec¬ 
tor has the stochastic order of 


(S.47) 



+ h\ + 


log n 
nh 2 


+ 


log n\ 
nh 3 / 


According to Proposition 4, the first term of the right-hand side of (S.46) 
can be decomposed into 


(S.48) 


^ n n 


Q2i(,i + 



i=l 


where Qu corresponds to the hrst and second terms in (S.41), Q 2 i corre¬ 
sponds to the third and fourth terms in (S.41), and Q^i corresponds to the 
fifth term in (S.41). Proposition 4 implies 

Qu = Q?M + QiM> 


where we have for s = 2,3, 

rnax{|A min (Q^ ) )|, |A max (Q^)|} = 0(rm) 

uniformly in i. Besides depends only on 7) for s = 2,3. The ( k,l ) 
element of Q^i has the form of 

m % 

E + E a i j ' e 2 (Tij , t^i ), 

i =1 i+i' 

where S,^ 1 = (erf 1 ). Note that uniformly in l and i, 

rrij 

£K"> 2 = o(i). 

k=1 


/c\ 

Uniformly in i, the elements of <53*, D\ ’ in (S.41), have the order of 


TTliOp^hi + h 2 + hg + 


3 log n log n log n 


nh\ + nh 2 J 


We can prove as in the proof of Lemma 3 that for s = 2,3, 


Ci 

K„ 


I,K. < Cov(„-‘/2£ Wf«<*>£,) < £-1, 


C 2 


i= 1 


Kn 


l qKn 
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for some positive constants C\ and C 2 . Hence we have 


(S.49) 


n 


- 1 / 2 ' 


J2wjQii§i =O p {hl + h\). 


2=1 


Similarly to the second term in the right-hand side of (S.46), we can demon¬ 
strate by using (S.43) that 


(S.50) 


n-^Y.WlQ^i 

2=1 


—O 

Kn f 


h 3 -\-h2~\-h 3 -\- 


log n log n 


nh\ nh 2 


log n\ 
nh | / 


Finally we evaluate the second term of (S.48) and it has a structure of V- 
statistics. By exploiting the structure, we evaluate the expectations and the 
variances of the elements by using Assumption A2. Then we have 


n-v^WlQm 

2=1 


O p 


1 ^ 1 1 ^ 1 \ 
y/nh 2 y/nK n h 2 y/nK n h\' 


The second result follows from (S.47), (S.49), (S.50), and the above equality. 


Proof of Lemma 6. This lemma can be proved in the same way as Lemma 
5 and the details are omitted. 


Proof of Lemma 7. From the definition of 7 * given after (5.5), we have 
^ IwgV - z iM T a)\ = °p( K n 2 ) 
uniformly in i. The above equality and (S.42) imply that the elements of 

- (Z T g 0 ) J 

Tl. 

2=1 

is uniformly O p (K~ 3 ) and the first result follows from this. As for the second 
result, first we note that 

|^7 ^7 I max — O p {m t 8 n ) 

uniformly in i from (S.38) and (S.39). Recall that 5 n is defined in (S.37). Thus 
the elements of Wj ($7 — S7 1 )(W.,-7* — (Z T go).) are bounded uniformly 

in l by 

_ m; 

CK- 2 8 n m*J2\ W iji\ 

3= 1 
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with probability tending to 1 for some positive constant C. Hence the second 
result follows from (S.42). 

Proof of Lemma 8 . This lemma can be proved in the same way as Lemma 
7 and the details are omitted. 


S.5. Theoretical results for general link functions. We state the 
results of Section 2 for general link functions when rm is uniformly bounded 
and satisfies the sub-Gaussian assumption, Assumption A6' here. Note 
that we have no counterpart of Theorem 1 for general link functions even 
when mi is uniformly bounded. 

Let v\ and V 2 be two processes each taking a scalar stochastic value at 
Tij, i = 1 ,... ,n, j = l,... ,mt. Then we define two inner products of v\ and 
v 2 by 


1 x A 

{vi,v 2 ) A = - ^vTAoiV^AoiV^ and (ui,u 2 ) A = E{(ui, u 2 ) A }, 

Tl 

i=l 

where v and v 2 i are defined in the same way as and 

A 0i = diag(/i'(X^/3o + Z^ igo (Ta)),..., n'(Xf mi (3o + Zj m .g 0 {T irni ))). 
The associated norms are then dehned by 

IMIn = a nd IM| A = ((^^> A ) 1/2 - 

We now define the projections, with respect to || ■ || A , of the fcth element of 
X onto Z T G and Z t Gb by 

n A X fc = argmin \\X k - Z 1 g || A and U An X k = argrnin \\X k - Z 1 g || A , 
g&G g&G B 


where 

i 71 

||X fc - Z T g\\ A = —e{ ~ (Z T g)f A 0 l V- 1 A 0 l (X ik - (Z T g ) )}, 

1=1 

with X_ ik = (Ajifc,..., Xi mik ) and [Z g) . = [Z^gifTi i),..., Z imi g(Ti mi )). 
We denote these projections by <p* Ak = HaA^ and p Ak = Tl/± n X k , and 
define another one by 

Pa k 11 a /* A k . 





where 


T/± n X k = argmin \\X k - Z 1 g\\%. 

9^Gb 

The arguments in Section 5.2 also apply to this <p* Ak - 

Some matrices are necessary to present Proposition S.l and we define 
them here. Let 


(ZtiXJ&oiVr'AoiXi 

\V: =l Wl^Vr l A 0 l X t 


U=i2LfA 0l V^A 0i W i \ 

E?=iWl^v i - 1 A 0t wJ 


(Hu H l2 
\H-n H 2 2 


(say), 


H n .2 = Hu - H V2 H- 2 l H 2l , and H" 


(Hn- 2)- 1 ■ 


Let be a p x p matrix whose (/c, Z)th element is 

i n 

- E E {& " (Z T (f * Ak ) fA 0 l Vr^A 0 i (X il - (g>X), )}. 

2—1 

Note that n —1 i3n.2 is an estimate of Ttyn- We assume that there exists a 
p x p positive definite matrix fiv such that 

(S.51) lim flvn = Tly- 

n —^oo 

We present Propositions S.1-S.3 before stating the assumptions for these 
propositions. By using Lemma S.l we can prove Proposition S.l based on 
the same arguments as those in [4]. 

Proposition S.l. (Asymptotic normality of (3y) Under Assumption S 
in Section 2 for the norm here, (S.51), and Assumptions Al', AZ, A3, Alf, 
A3, and A0, we have 

= A) + H 11 E(X; - mHf 2 1 H 2 l ) T AoiVT^i + °p(-t=)• 

i=i V 

We also have 

? v 1/2 09v-/3o)4n(O ,/ p ), 

where Tv is 




24 


We give in Proposition S.2 the semiparametric efficiency bound for esti¬ 
mation of /3q. It can be proved in the same way as Lemma 1 of [4] and the 
proof is omitted. We denote the semiparametric efficient score function of (3 

by 

f* _ (7* J* \T 

l /3 P/3 1 j • • • ) '‘(Up) • 

Its expression is given in Proposition S.2. When V = E*, we denote ip* Ak (t) 

by ^e//,fc(*)- 


Proposition S.2. (Semiparametric efficiency bound) Under the same 
assumptions as in Proposition S.l, we have 

n 

i%k = + (^ T go), )}, 

z=l 

and the semiparametric efficient information matrix for (3 is given by 
lim -E{l* a (l* 3 ) T } = fl s with V = in (5.51). 

n^-oo n M M 

Proposition S.3 is parallel to Proposition 3. It can be proved in the same 
way as Corollary 1 of [4], and it also follows from Proposition S.l and Lemma 
S.l (vii). Thus the proof is omitted. 

Proposition S.3. (Oracle efficient estimator) Under the same assump¬ 

tions as in Proposition S.l, we have with Vi = E* in (2.2) 

^i /2 (^E-/3o) An(0 ,I p ). 

Now we describe assumptions for the above propositions. Here we need 
Assumption A6 7 since we need some results from the empirical process the¬ 
ory in dealing with general link functions. 

Assumption Al'. 

(i) ffix) is twice continuously differentiable and inf xe n ffi(x) > 0. 

(ii) For some positive constant Cgg, we have limsup \n(x)\/\x\ Cm < oo. 

|fc|—^OO 

Assumption A2k The joint density functions fij(t ) and fijj'(s,t) are uni- 
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formly bounded and we have for some positive constants Cbi and Cb 2 , 

n m,i 

Cb i < r,EE fij(t ) < C B 2 on [0,1] 

n i =l j =l 

1 n 

and Cbi < - ^ ^ , t) < C B 2 on [0, l] 2 . 

* =1 id' 

Assumption A4h For some positive constants Cb 5 and Cbg. we have uni¬ 
formly in i, 

CBh < < Cb 6 - 

Assumption A5h For some positive constants Cbi and Cbs ■ we have uni¬ 
formly in i, 

Cb7 < A m i n (Vi) < A max (Vi) < Cbs- 

Assumption A6h For some positive constants Cbw and Cbh, we have 
uniformly in i, 

max CBioE{exp(|ej| 2 /CBio) — < C'sn- 

l<i<n 

To prove Proposition S.l, we have only to proceed as in [3] by replacing 
their Zij, Z i . and with Wjj , W . and Z ^ cp^jftf respectively. We just 

state the relevant changes and remarks in the following: 

(i) Lemmas S.2-S.4 of [3]: We reorganize these lemmas in Lemma S.l given 

later. Its (i)-(iii), (iv) and (vi) correspond to Lemma S.2, the latter 
half of Lemma S.3 and Lemma S.4 of [3], respectively. The former half 
of Lemma S.3 of [3] seems to be used in their Corollary 1. However, it 
can be relaxed to (v) of Lemma S.l here. 

(ii) Lemma S.8 of [3]: The regressors and W %] still form a VC class 
and we can proceed completely in the same way as in [3]. 

We state Lemma S.l in the following. It can be proved it in the same way 
as Lemma 1. 

Lemma S.l. Assume that Assumptions Al', AS!, A3, AJf, A3 hold. 
Then we have the following results. 

(i) There are positive constants C\ and C2 such that 

Ci||<7||g,2 < \\Z T g\\ A < C 2 ||g||G,2 


for any g € G. 
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(ii) 


(in) 


There are positive constants C 3 and C 4 such that 

\\9\\h, 00 < C 3 K n \\g \\ 2 G}2 < C 4 K n (\\Z T g\\ A f 
for any g € G B - 

There is a positive constant C5 such that for any (3 € and g € Gb, 
\\X t (3 + Z T g \\ 00 < C 5 K 1 J 2 \\X T f3 + Z T g\\ A , 


where ||f ||oo = maxjj \vij\. Besides we have for some positive constant 
C 6 , 

IM| A < C'elkHoo. 


sup 

S1;S2GGb 


(Z T gi, ZT 92)n - (Z T 9U ZT 92) A 

11^1^11^2^ 


O v (K n ^\ogn/n). 


(v) For any positive constant M, we have 


(Xj - Z r 9j ,X k - Z T g k ) A - {X.j - Z T g v X k - Z T g k ) A = o p { 1 ) 


uniformly in gj € Gb and g k € Gb satisfying ||5j||g,2 < M and 
||Gr,2 < M, respectively. 

(vi) For any stochastic process 5 n taking values at T\j satisfying that ||5 n ||oo 
is uniformly bounded in n and {S n ,ij}pl 1 are mutually independent in 
i, we have 


sup 


(5 n ,Z T g) A -(5 n ,Z T g) A 

\\ zT 9\\ A 


O p (^Kjn)\\5 n \\ 00 


(vii) We also have Assumption S for the norm here. Then we have for 
k = 1) • ■ • ,p, Hv^Afelloo = Op( 1), 

W zT {<Pa k - TAk)\\n = Op( 1), and || zT {<F* Ak ~ <^Afc)l| A = o p (l). 
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In semivarying coefficient modeling of longitudinal/clustered data, 
of primary interest is usually the parametric component which in¬ 
volves unknown constant coefficients. First we study semiparametric 
efficiency bound for estimation of the constant coefficients in a gen¬ 
eral setup. It can be achieved by spline regression using the true 
within-subject covariance matrices, which are often unavailable in 
reality. Thus we propose an estimator when the covariance matrices 
are unknown and depend only on the index variable. To achieve this 
goal, we estimate the covariance matrices using residuals obtained 
from a preliminary estimation based on working independence and 
both spline and local linear regression. Then, using the covariance 
matrix estimates, we employ spline regression again to obtain our 
final estimator. It achieves the semiparametric efficiency bound un¬ 
der normality assumption and has the smallest asymptotic covariance 
matrix among a class of estimators even when normality is violated. 

Our theoretical results hold either when the number of within-subject 
observations diverges or when it is uniformly bounded. In addition, 
the local linear estimator of the nonparametric component is superior 
to the spline estimator in terms of numerical performance. The pro¬ 
posed method is compared with the working independence estimator 
and some existing method via simulations and application to a real 
data example. 
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CHENG ET AL. 


1. Introduction. Suppose we have a scalar response Y, and two p- 
dimensional and g-dimensional covariate vectors X and Z. Longitudinal 
data consist of (Yy, Xjj, Zij,T{j),i = 1,... ,n,j = 1,..., rrii, where Yij, 
Xij = (X t ji ,..., X ljp ) T and Z %3 = (Z^i,..., Z ijq ) T are respectively the 
values of Y, X and Z of the ?'th subject at the jth observation time T{j € 
[0,1]. Such kind of data are commonly acquired for various purposes, such 
as evidence based knowledge discovery and empirical study, in a wide range 
of subject areas. When the subjects are changed to clusters and the Tj/s are 
observations on some index variable other than time, they are usually called 
clustered data. We assume that all the covariates are uniformly bounded for 
technical reasons. Besides, we let Z^\ = 1 and suppose X has no constant 
element for all i and j. 

For i = 1,..., n, denote 

Xi = (Xu, • • •) Ximi ) 7 ', Z_i = {Zn ,... , Z imx ) T , and Tj = (Tn ,..., Tj m J T . 

A popular model for longitudinal data analysis is the semivarying coefficient 
model, which is specified by 

(1.1) E(Yij\Xij, Zij,Tij , Xj, Z_i,T_i) 

= Zij , Tij ) = p(Xijf3 + Zijg(Tij)) = p,ij, 

where A T stands for the transpose of a matrix A. In model (1.1), g(x) is a 
known strictly increasing smooth link function, /3 is an unknown regression 
coefficient vector, and g(t) = (gi(t),...,g q (t)) T is a vector of unknown 
smooth functions. Define 

(1.2) e t = (en,...,e irni ) T = Y { - g ., and £* = Varfel^Q, Z*, TJ, 

where Y { = (Y t \,, Y imi ) T , g. = (g t \,..., gi mi ) T , and S* is an ra; x ra; 
positive definite matrix depending on X -. Z _,and T { , i = 1,... ,n. This is 
a standard marginal model in longitudinal data analysis [24]. 

Model (1.1) consists of a parametric component, which provides informa¬ 
tion on the constant impacts of some important covariates, and a nonpara- 
metric component which captures the dynamic impacts of the other covari¬ 
ates. In this way the model is able to reflect unknown nonlinear structures in 
the data while retaining similar interpretability as the classical linear models 
at the same time. There is an extensive literature on the variable selection, 
structure identification, estimation, and inference issues [6, 8, 12, 22, 25]. 
In particular, often of primary interest is to have access to the parametric 
component while the nonparametric component is viewed as the nuisance 
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part. In this regard, it is well known that assuming independence or some 
mis-specified working covariance structure yields less efficient estimation of 
the constant coefficients. Therefore, a substantial portion of the existing lit¬ 
erature aimed at improving the efficiency via modeling and estimating the 
within-subject covariance structure [6, 7, 10, 18, 26, 27, 28], which is itself 
a challenging task. 

In this article, we focus on the identity link function and make contribu¬ 
tions to the efficient estimation problem for model (1.1) in three directions. 
First, we allow some of the m/s to tend to infinity. As far as we know, 
this setup has not been treated before and the problem is nontrivial. Our 
results also hold when the rrii s are uniformly bounded and Cj satisfies the 
sub-Gaussian property. See the supplement [5] for the details. When all of 
the mi's are diverging, that is, if we have densely observed data, it becomes 
a kind of functional data problem and is out of the scope of this paper. 
Second, we study explicit expression of the semiparametric efficiency bound 
for estimation of (3 and asymptotic normality of the generalized estimat¬ 
ing equations (GEE) spline estimator under general covariance structures 
and error distributions. Using the true covariance matrices in the GEE es¬ 
timation leads to optimality among all GEE estimators of the parametric 
component. Furthermore, it achieves the semiparametric efficiency bound 
when the errors are conditionally normal. Our results are in parallel to that 
for partially linear and partially linear additive models given by [13] and 
[4] respectively. Those models are among a rich variety of semiparametric 
ways of modeling longitudinal data, and they differ from semivarying coef¬ 
ficient models in that their nonparametric components admit more direct 
additive expressions. Partially linear (additive) models were also considered 
by [14, 15, 16, 17, 23], among which [14, 15, 16, 23] used kernel method and 
[17] used spline estimation. 

Our third contribution is to deal with adaptive efficient estimation when 
the within-subject covariance matrices are estimated nonparametrically us¬ 
ing the data at hand. Notice that [4] ignored this practical issue and did 
not consider estimation of the covariances, and [13] suggested using some 
parametric specification which can be estimated -y/n-consistently. We con¬ 
sider the case where the nonparametric within-subject covariance matrices 
depend only on the observation times but not on the other covariates. Such 
assumptions are reasonable because we do not assume that the observation 
times are regular across different subjects or they are dense. Indeed, with 
irregular and/or sparse observation times, estimating the covariances in a 
completely nonparametric way, by letting them to be dependent on all of 
the Tjj , Xij and Z% 3 nonparametrically, is particularly problematic and even 
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unreliable as the curse-of-dimensionality problem arises. Our covariance es¬ 
timator is constructed based on residuals yielded by an initial estimation. 
The final estimator of the true value of (3 is then given by plugging-in the 
covariance estimates to the GEE spline estimation. We show the asymptotic 
equivalence of our final estimator to the oracle efficient estimator which uses 
the true covariance matrices in the GEE spline estimation. 

The above result is partly motivated by the study of [14] on efficient esti¬ 
mation in partially linear models under the same nonparametric covariance 
structure. However, the kernel profile method taken by [14] involves only 
local linear regression, thus, to achieve semiparametric efficiency it requires 
some complicated iterative backfitting calculation except for the identity link 
function [15, 16]. By comparison, our approach to estimating the parametric 
and nonparametric components in the mean function is different and much 
simpler. We ingeniously use both spline approximation and local linear es¬ 
timation to avoid complicated calculation while allowing for the asymptotic 
equivalence property at the same time. To the best of our knowledge, there 
are no existing results for semivarying coefficient models, especially when 
some of the nrii s tend to infinity or when the S*s are estimated. 

Our final estimator is some kind of feasible generalized least squares 
(FGLS) estimator since we replace the within-subject covariance matrices 
with their nonparametric estimates. Even if our assumption on the covari¬ 
ance matrices fails to hold, it still possesses the asymptotic normality under 
mild conditions and still makes use of some information of the covariance 
matrices. For example, if the covariances depend on some time-dependent 
covariates, to some extent such effects are still captured by our method. 
In this sense, compared with existing methods which use either parametri¬ 
cally estimated or some ad-hoc covariance matrices [7, 18, 21], our approach 
is more adaptive to the unknown covariance matrices. A promising cluster 
bootstrap inference method was proposed by [2]; it assumes some parametric 
within-cluster covariance structure, however. In the case where there is one 
observation for each subject/cluster, our assumption on the covariance ma¬ 
trices reduces to that of [20], which also suggested to improve the efficiency 
in a similar manner. 

Our simulation study shows that numerically the proposed method out¬ 
performs the working independence approach and the quadratic inference 
functions (QIF) method by [18], and it behaves close to the oracle estimator 
which uses the true covariance matrices. Note that, while the QIF procedure 
is suitable when there is some kind of regularity and stationarity in the er¬ 
ror process, our procedure adapts to both non-stationarity and irregularity. 
We also applied our method to the CD4 count dataset and identified some 
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interesting new effects not detected by the working independence approach. 

After the semiparametric efficient estimation, we can estimate and make 
inference on the nonparametric component in the same way as in dealing 
with varying coefficient models, using the difference between the response 
and the estimated parametric part [25]. When p and q are both diverg¬ 
ing and the model is sparse, [6] suggested a simultaneous variable selection 
and structure identification procedure and showed its consistency property. 
By combining the method with the proposed estimation procedure and by 
putting together the corresponding consistency and efficiency results, we 
have an efficient estimation procedure in this case. 

The organization of this paper is as follows. In Section 2 we derive the 
semiparametric efficiency bound for the constant coefficient vector (3 and 
asymptotic normality of GEE spline estimators. In Section 3, we propose 
an efficient estimator of /3 when the errors have some general covariance 
structure and state its asymptotic equivalence to the oracle estimator which 
assumes the covariance matrices are known. Section 4 summarizes and dis¬ 
cusses results of our simulation and empirical studies used to assess numeri¬ 
cal performance of the proposed efficient estimator. Section 5 contains some 
technical assumptions and proof of the asymptotic equivalence. In the sup¬ 
plementary material [5] we give additional simulation results for estimation, 
proofs of the other theoretical results, some lemmas, and theoretical results 
when the m;’s are uniformly bounded. 

2 . Semiparametric efficiency bound for (3. In this section, V] is a 
given rrii x m; inverse weight matrix depending only on Aj, Z_ i: and T i} 
i = 1,... ,n. We use a A n -dimensional equispaced B-spline basis on [0,1], 
denoted by B(t), to approximate the function g(t). See [19] for the definition 
and properties of B-spline bases. We set Wij = Zjj ® B(Tij) and W = 
(Wji,..., Wi mi ) T , where (g) is the Kronecker product, and we denote the 
true values of (3 and g{t) by f3$ and go(t) = {goi(t), • • •, 90 q(t)) T respectively. 
Then we estimate (3q and go(t) by minimizing with respect to (3 and 
simultaneously the following objective function: 

n 

(2.1) X;C£ ~g(X^ + W^fvr 1 ^-g{X^ + W a )), 

i =1 
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where 7 € M 9 ^ 71 and the j th element of u(X„j3 + W., 7 ) is g ( Xj :j (3 + W? 7 ). 
Thus the generalized estimating equations are 

n 

- giXfi + W, 7 )) = 0, 

1=1 

n 

(2.2) and ]T WjA i V~ 1 (Y_ i - g(X t (3 + W n )) = 0, 

where Aj is an rrii x nn diagonal matrix dehned by Aj = diag(//(X^/3 + 
wf x 7 ),..., jj!{Xj m J3 + WT). 7 )). Denote the solution to (2.2) by /3v and 
7 v = ( 7 iy, • • •, lqv) T ■ Then the GEE spline estimator with weight matrices 

V~\ i = 1,... ,n, for /3 0 is and that for g 0 (t) is ( 7 f v B(t ),... , 7 J v B(t)) T . 

Hereafter we focus on the identity link function and present the asymp¬ 
totic normality of f3y i n Proposition 1 under general error distributions as 
specified in Assumption A 6 given in Section 5. We allow some of the rrii s to 
diverge in a way like ^^=1 m f = 0(n) and maxi<j< n m, = 0(n 1//§ ). See 
Assumptions A 1 and A 2 for the specific conditions on the rrii S- We refer to 
the supplement [5] for the results for general link functions when the rrii s 
are uniformly bounded and the tj’s satisfy the sub-Gaussian property. 

First, we introduce some function spaces, inner products and projections. 
Let L 2 denote the space of square integrable functions on [0,1] and recall 
B(t ) is the equispaced B-spline basis on [0,1]. We define two function spaces: 

G = {(si, - ■ ■ ,g q ) T \gj e L 2 , j = 1,... ,<?}, 
and G b = {{B r ~f 1 ,..., B T j q ) T | 7 = ( 7 f,... ,rf) T € R 9 ^™} . 

Note that Gb C G. Next, let v\ and v 2 be two stochastic processes each 
taking scalar values at TL, i = 1 ,,n, j = 1,..., rrii. Then we define two 
inner products of v\ and v 2 by (vi,v 2 )n = \ Ya =1 TijVp 1 ^* and (v\, v 2 ) 1 = 
E{(m,n 2 )n, where v u and v 2i are dehned in the same way as Tjj and we 
define the associated norms by ||n||)( = ((u,u ))() 1 / 2 and ||u|| v = ((v,v) v ) 1 / 2 . 
The projections, with respect to || • II 17 , of the fcth element of X onto Z T G 
and Z t Gb are given by 

(2.3) Il v X k = argmin \\X k - Z T g \\' and B Vn X k = argmin ||X fc - Z T g\\ v , 

9&G g&G B 

where ||X fc - Z T g\\ v = - (Z T g)f V~ 1 (X ik - (Z T g ) ^)}, 

with X ik = ( X ak ,..., X imik ) T and (. Z T g) i = (. Zf ig (Tn ),..., Zf mi g{T irni )). 
Hereafter we write ^p* Vk = IU/X k G G and Tp Vk = n vnX k G Gb- 
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Assumption S 

(i) The projections <Py k (t ), k = 1, ... ,p, and the varying coefficient func¬ 
tion go are twice continuously differentiable on [0,1], and they and 
their second order derivatives are uniformly bounded in n. 

(ii) We take K n = |_c^n 1,/5 J for some positive constant ck, where |ycj is 
the largest integer no greater than x. 

Assumption S(i) is a mild and standard assumption for semiparamet- 
ric models. We consider the existence and smoothness properties of <Py k (t) 
in Section 5. Recall that all the covariates are assumed to be uniformly 
bounded. Since the relevant functions are assumed to be at least twice con¬ 
tinuously differentiable, we recommend quadratic or cubic spline approxima¬ 
tion. Then the order of K n specified in Assumption S(ii) is optimal. If the 
smoothness of different functions varies, we refer to [ 1 ] for the convergence 
rate interfere phenomenon. 

The following matrices are necessary in order to present asymptotic nor¬ 
mality of flv- 


(2.4) 


H = 


(Y,UxJv i - 1 x i 

\j:uwjv- i x t 


Hu. 2 


= //,, - h V2 h^h 


*21 


£ 7 =i XfVr'WA = (Hu 

£ 7=1 Wjv-'wJ W 21 

and H u = (ffn. 2)- 1 • 


H 12 \ 

H22J ’ 


Let f Ivn be a p x p matrix whose (k, /)th element is 

(Xk ~ Z T ify k , X t — Z T ipyi) v 
1 71 

= ~ y.Ejfe - (z T v * Vk ) fv-\x a - Cg^). )}. 

' L i=1 

Note that n~ l Hu -2 is an estimate of fiy n . We assume that there exists a 
p x p positive definite matrix fiv such that 


(2.5) 


lim Clvn = 

n—>00 


Now we are ready to state the asymptotic normality of (3v under general 
error distributions as specified in Assumption A 6 given in Section 5. Its 
proof is given in the supplement [5]. We denote the normal distribution with 

mean rj and covariance II by N( 77 , fi), and by we mean convergence in 
distribution. Let 7/ be the /-dimensional identity matrix. 
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Proposition 1. (Asymptotic normality of 0 v) Under Assumption S, 

(2.5) , and Assumptions Al-6 given in Section 5, we have 

n 

Pv = Po + H u - W i H22H2i) T Vr\ i + o p (—=). 

i=i Vn 

We also have 

r~ 1 / 2 0 v -Po)^^(o,i P ), 

where Tv is given by 

n 

(2.6) H u £ {{X i -W i H^H 2 l ) T V i ~ 1 Y li V i -\X i -W i H^H 2 l )}H 1 \ 

2—1 

Under (2.5), (3v is yVconsistent for (3 0 . We can estimate its asymptotic 
covariance IV given in (2.6) by replacing the S.;’s with some estimates 
based on (3v and yy For example, we can replace X,; with f(e[ where 
= Yj — XT0v — Wjlv- However, this approach may be too crude and it 
does not make use of the common information on the covariance structure 
contained in different subjects. Alternatively, we can estimate the S,’s by 
applying smoothing techniques to some residuals based on some assumption 
on the covariance structure. We investigate this problem in Section 3. 

Next, Proposition 2 gives the semiparametric efficiency bound for estima¬ 
tion of /3 0 - It can be proved in almost the same way as in Section 4.4 of [13] 
and Lemma 1 of [4] and the proof is omitted. We denote the semiparametric 
efficient score function of (3 by l*p = ( 1 *^, ■ ■ ■ , l*p p ) T ■ Its expression is given in 
Proposition 2. Then we denote <p^ k (t) by fc(^) w l ien K: = S, in (2.1). 

Proposition 2. (Semiparametric efficiency bound) Under the same as¬ 
sumptions as in Proposition 1, we have 

n 

lh = E& - ( zT ‘P*eff,k) f^7 1 {y i - XfPo - (Z T go)} , 

2—1 

and the semiparametric efficient information matrix for (3 is given by 
lim -E{l* g (l*g) T } = with V t = S, : in (2.5). 

n—¥ oo Ti s' s’ 

Proposition 3 gives the asymptotic normality of /3s, the so called oracle 
estimator, which uses the true covariance structure in the GEE spline regres¬ 
sion. It also asserts that /3s achieves the semiparametric efficiency bound 
derived from Proposition 2. The proof is given in the supplement [5]. 
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Proposition 3. (Oracle efficient estimator) If we take V) = Ej in (2.2) 
then, under the same assumptions as in Proposition 1, we have 

^oJ 2 (fe-/3o)4N(0,/ p ). 

In practice, usually the Ej’s are unknown and we have no direct access 
to the semiparametric efficient score function or the oracle estimator. In the 
next section we study nonparametric estimation of the covariances so as to 
improve the efficiency. 

3. Efficient estimation. The semiparametric efficiency bound of (3 
given in Proposition 2 indicates that knowledge, or at least estimation, of 
the Ej’s is necessary in order to construct a semiparametric efficient estima¬ 
tor. On the other hand, as discussed in the Introduction, when the Ej’s are 
unknown it is almost impossible to estimate them in a fully nonparametric 
way. Fortunately, for longitudinal or clustered data sets, it is reasonable to 
make some assumptions such as 

(3.1) Ej = E(TJ, i = l,...,n, 

where the (j, j)th element of Ej is given by a 2 (Tij) and the (j, j 7 )th element is 
given by cr(Tjj, Tjj/) when j / f, for some smooth functions <J 2 (t) and a(s, t). 
Based on (3.1), in Section 3.1 we construct nonparametric estimates of the 
covariances and then use them to derive an FGLS procedure to improve the 
efficiency, and we show in Section 3.2 its asymptotic equivalence to the oracle 
estimator /3s- We also discuss estimation of the nonparametric component. 

3.1. Methodology. A preliminary estimation of (3$ and go is necessary 
before we can estimate the covariances. For simplicity and robustness, we 
utilize working independence in the GEE spline estimation. As noted fol¬ 
lowing Proposition 1 we could then use the resultant residuals to estimate 
the covariance matrices directly. However it is intuitively better to further 
make use of the covariance structure (3.1) by applying some nonparametric 
smoothing techniques to the residuals. In addition, alternative to the spline 
estimator, we could apply smoothing techniques to the pseudo responses 
Y - — Xj(3v to obtain another estimator of go- We take this latter approach 
for technical and numerical reasons given in Remark 1. After the preliminary 
estimation, for each i = 1,n, we estimate Ej by applying local linear 
regression and denote the resultant estimate by E j . Our final estimator of /3o 
is then obtained by taking Vj = Ej, i = 1,..., n, in the GEE spline estima¬ 
tion. Note that in the trivial case where mi is fixed for all i and the T tJ : s are 
equi-spaced, we can estimate Ej without using any smoothing techniques. 
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Let K be a given kernel function. Our estimation procedure is formally 
specified as follows: 

Step 1. Estimate /3o by the GEE spline method given in Section 2 with 
Vi = I mi , i = 1 ,..., n, and denote the resultant working independence 
estimate by (3j. 

Step 2. Estimate go(t) by applying local linear regression to {Yij—i = 
1 ,n,j = 1,..., mi }, using bandwidth h\. We denote the resultant 
estimate by g(t), which is written as 
(3.2) 


g{t) = D q (A ln (t )) 


-i 


Nihi 


n mi 

EE Z 

*=1 j =1 


tjy 


Tij-t K 

v J 


Tij ' HY; :i X 


where N\ = rrij, D q = I q ® (1 0), and 


n mi 


A ln (t) = 


N\h\ 


EE< Z « Z 5) 


hi 


i= 1 3 = 1 


Tij-t (Tij—t^ 2 
■, *.1 


(^) J 


iv 


Tij t 
hi 


Step 3. Calculate the residuals, denoted as given by 

% = Y ij ~ x IiPi ~ z Tj9{Tij), i = 1 , ■ ■ ■ ,n,j = 1 


,mj. 


Step 4. Estimate the variance function a 2 (t) by applying to the squared 
residuals local linear regression with bandwidth h 2 . Denote the resul¬ 
tant estimate by <r 2 (t); it can be expressed as 


(3.3) a*(t) = (10)(A 2n (t)) 


-l 


1 


n mi 


i 


1 Vi/i 2 


EE [ikzl E 


T — t 


)(< 


ij > i 


i= 1 3=1 


hi 


where A 2 n (t) = ^ £?=i 


, ^2 


k(t)' 


Tjj— t (Tj-t ^2 
hi 


( t \1 
\ h 2 > 


Step 5. Estimate the covariance function a(s, t ) by applying to {%%/, j / 
j', i = 1,..., n} local linear regression with bandwidth / 13 . We denote 
the resultant estimate by a(s,t ); it has the following expression: 


(3.4) d(s,t) = (100)(v4 3 n (s,t)) 


EE 


-1 


1 \ 


N ^ Si M 


h 3 

T-r—t 

-LI - 

h 3 / 




X 
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where = X^=i m i{ m i ~ 1 ) and 

t) 

1 



N 2 hj — — 

d 1 3+3 


1 Tjj—s Tjj , ~t 


^3 


hi 




h 3 


h 3 


Step 6. Calculate Sj by combining the results from steps 4 and 5 by letting 

= a(T^,%)/(i + f) + a 2 (Tij )I (j = j'), 


and then estimate (3q with V. = S* in the GEE (2.2). Denote the 
resultant estimate of (3q by /3g. 

Step 7. Update the nonparametric estimator of go(t) given in Step 2 by 
replacing Y l3 - X^fii with Y VJ - Xfif3^, i = 1 = 1 

Denote the resultant estimator by gu ( t ). Alternatively, we can estimate 
go(t) with splines, by replacing (3 with /3g and taking V) = Sj in the 
GEE (2.2). Denote the resultant estimator by <75 (i). 

In general the covariance function estimate fi(s, t ) given by step 5 may not 
be positive semidefinite. We can modify it by truncating the eigenfunctions 
in its spectral decomposition that have eigenvalues not exceeding some non¬ 
negative constant A l- Then we have positive definite covariance estimates if 
we replace a(s,t ) with this modified version in step 6 . 


Remark 1. When we calculate (3j in step 1, we also have 7 j and get 
the set of residuals {T,;j = Y^ — Xj-(3j — Then we could omit steps 

2 and 3 of our procedure by exploiting this set of residuals when we estimate 
Sj in steps f-6. However, our simulation results summarized in Section 4 
indicate that this simplified approach is inferior to the proposed one. Intu¬ 
itively speaking, to achieve the semiparametric efficiency in the GEE spline 
estimation of (3$, to some extent the accompanying estimation of go(t) re¬ 
quires undersmoothing and thus it often exhibits spurious wiggling patterns. 
Besides, it is difficult to justify theoretically this simplified approach as the 
local property of spline estimators seems to be intractable. 


3.2. Asymptotic results. First we establish the asymptotic equivalence 
between the data-driven estimator /3^ and the oracle estimator /3s by ex¬ 
ploiting some desirable properties of Sj. First, we specify our assumptions 
on the smoothness of go(t), a 2 (t ) and a(s,t). We need Assumption B given 
below, which is more restrictive than usual, in order to evaluate the differ¬ 
ence between St 1 and S.T 1 . 
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Assumption B. 

(i) Assumption (3.1) holds. 

(ii) The true varying coefficient function go(t) is three times continuously 
differentiable on [ 0 , 1 ]. 

(iii) The variance function <r 2 (f) is three times continuously differentiable 
on [ 0 , 1 ]. 

(iv) The covariance function cr(s,t) is three times continuously differen¬ 
tiable on [ 0 , 1] 2 - 

In the following we collect our assumptions on the kernel function K and 
the three bandwidths used in the construction of the proposed estimator. 
Assumption H(i) on K is a standard one. When Assumption B holds, our as¬ 
sumptions on the bandwidths hi, h 2 and /13 are not restrictive. For example, 
the optimal order of hi and h 2 is n -1 / 5 which falls into the specified range. A 
larger order is recommended only for /13 due to the two-dimensional smooth¬ 
ing in step 5. However, since the effective number of observations used in 
step 5 of the procedure is N 2 we anticipate that bandwidth choice will not 
seriously affect the performance of our final estimator. 

Assumption H. 

(i) The kernel function K is some continuously differentiable symmetric 
density function with a compact support. 

(ii) The bandwidths hi, /12 and h 3 satisfy hi = cin~ ah for some 1/6 < 
ah < 1/4, h >2 = C 2 n~ bh for some 1/6 < bh < 1/4 and h 3 = c^n~ Ch for 
some 1/6 < c/i < 1/4, where ci, C 2 and C 3 are some positive constants. 

The asymptotic expression of S,; is given in Proposition 4, which is verified 
in the supplementary material [5]. Note that we need more elaborate repre¬ 
sentations than those used by [14] since we deal with a (p+giF n )-dimensional 
linear regression model. Note also that the functions Bj, j = 1,... ,4, that 
appear in Proposition 4 are implicitly defined in the proof of the proposition 
and only their boundedness property is needed in the proof of Theorem 1. 

Proposition 4. (Representations of the covariance estimators) Under 
the assumptions in Proposition 1 with V) = I mi , and Assumptions B and H, 
we have the following representations of a 2 (t) and a(s,t). Uniformly in t, 

cr 2 (t) - a 2 (t) = Bi(t)h\ + B 2 (t)Ei(t) + O p {h\ + h\) + 
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logn 

nho 


where uniformly in t 

*<*> = (4±) K (^)(4 - AW - °r 

i =1 j =1 \ n 2 / 

and B\(t) and B 2 (t ) are bounded functions. Uniformly in s and t (s 7 ^ t), 
d(s,t) ~ cr(s,t) = B 3 (s,t)hl + B 4 (s,t)E 2 (s,t) + O p (hf + hf) + O p (^|^ + 


where 


E 2 (s,t) = 


N 2 h\ 2 2 
3 1=1 j¥=j' 


( 1 \ 


T^f-t 

\t/ 


A' 


Tij - s\ - i 


K 


L y 


/»3 


i^ij^ij' & (Tij , Tij /) ) 


= a 


/ / logn 
' \ y n/tg 


uniformly in s and t, 


and B 3 (s,t) and B^(s,t) are bounded functions. 


We state in Theorem 1 the desirable equivalence property of /3g to the 
oracle estimator. The proof uses Proposition 4; it is tedious and technical 
and thus is postponed to Section 5.4. We have not yet obtained a similar 
result for general link functions even when the mf s are uniformly bounded, 
and that is a future research topic. 


Theorem 1. Under the assumptions in Proposition 4, we have 

(3-Z + o p (n~ 1/2 ). 

Suppose (3.1) fails to hold, but Var(ej | Tf) still can be represented by some 
functions cr 2 (t) and a(s,t). Then Proposition 1 and Theorem 1 continue 
to hold Sj = Varfe,- | X - . Z . Tf) is replaced by Var(e ?; | Tj. We are still 
exploiting the information on Var(ej | TJ. 

Besides, we can replace the three times continuously differentiability with 
the twice continuously differentiability and the Holder continuity of the sec¬ 
ond derivatives of order aq, a 2 , and a 3 in assumptions B(ii), B(iii), and 
B(iv), respectively. In this case, the bandwidths in steps 2, 4, and 5 of our 
method have to satisfy the condition y/n(h\ +ai +h ^ +a2 +h 2+a3 ) —> 0. Note 
that a 3 must be positive because step 5 of our procedure requires two- 
dimensional smoothing. Then we can prove similar results when 0 < aq < 1, 
0 < a 2 < 1, and 0 < a 3 < 1. Specifically, the O p (/i 3 ) terms in Proposition 4 

will be replaced by O p (/i^ +Qj ), j = 1,2,3. 
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Remark 2. In Proposition 2, no assumptions on the structure of the 
Xj ’s or the conditional normality of the e % ’s is imposed. However, as men¬ 
tioned before it is difficult to estimate the S,. ’s in a fully nonparametric 
way and thus we impose assumption (3.1). On the other hand, when (3.1) 
holds, we should use this information in calculating the semiparametric ef¬ 
ficient score function. Unfortunately, under general errors this task seems 
intractable and we have no results in this regard. Nevertheless, when (3.1) 
and some regularity conditions hold, we come up with some remedies to im¬ 
prove the efficiency, as compared to using some working covariance struc¬ 
ture. Indeed, /3g has the smallest asymptotic variance among all j3y in this 
case, based on Propositions 1-3, Theorem 1, and the fact that it is an FGLS 
estimator. Furthermore, it is semiparametric efficient when e, is normally 
distributed conditionally on X; , Z; and T t , as discussed in A.l of [23]. 

Suppose we use cubic splines in the final spline estimator given in Step 
7. Then, under the assumptions in Proposition 4 and assume the minimum 
eigenvalue of H 22.1 = H 22 — Hi 1 1 Hi 1 is bounded below by Cn/K n for 
some positive constant C, we can show the following asymptotic normality: 

Vn/Kn^ityV^gsit) - g 0 (t)) 4 N(0, /,), 

where A/(t) = lim n ^ 00 nKf 1 (I q (g) B(t) T )Hf 2 i(I q <8> H(t)). As for the up¬ 
dated local linear estimator given in Step 7, let /j 2 = f u 2 K{u)du and 
vq = f K(u) 2 du , and suppose the assumptions in Proposition 4 hold and 
hi = Cn^ 1 ^ , then we have the following asymptotic normality: 

V N ihi(gu(t) - go(t) - A N(o, v Q 'S> u (t)) 

^ n rrii 

where ^u(t) = A^A^f 1 , Ai = lim — Y] V = t)fij(t), 

n—foo J\l 1 zz ' 

i =1 j =1 

n rrii 

A 2 = lim — y2y2E(Zi j Zf j \ T i j =t)fi j (t)E(e 2 j \T ij = t ), and f tJ (t) de- 

n—foo J\ 1 zz ' 

»= 1 1 =1 

notes the density of . 


4. Numerical studies. 

4.1. Simulation study. In our simulation study summarized in this sec¬ 
tion, the data were generated from the following model: 

Y ij = X ijPo + zTjQo(Tij) + efTij), j = 1 ,..., m h i = 1 ,..., n, 
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with the first component of Z VJ being taken as 1. The number of observation 
time points in the ith subject was set as mi = mo + binomial(m r , 0.65). 
Then the observation time points were uniformly distributed over the 
interval [(j — l)/(mo + m r ), j/(mo +m r )\, j = 1, ■■■ , m; . We note that when 
nii = mo + m r , the subject is observed at all follow-up time points; when 
mi < mo + m r , the subject may be lost to follow up. This setup is intended 
to model real and more complicated scenarios that often happen in practice. 
We set mo = 6 and m r = 6 . We generated the other (p + q — 1)—dimensional 
covariates from a multivariate Gaussian distribution, and we considered the 
following coefficients settings: 

p = 4, q = 4, /3q = (5, 5, —5, — 5) T and 

go{t) = (3.5sin(27rf), 5(1 — f) 2 ,3.5(exp(— (3t — 1 ) 2 ) +exp (—(At — 3) 2 )) — 
1.5,3.5t 1 / 2 ) r . 

The random error process €i(t) was simulated from an ARMA(1,1) Gaussian 
process with mean zero and covariance function co v(ei(s),€i(t)) = up^ s ~^. 
We set uj = 4.95 and considered p = 0.4 or 0.8. 

We considered two types of working covariance structure: working inde¬ 
pendence covariances and the proposed covariance estimates. For the sake 
of comparison, we also considered using the true covariances and using the 
covariance estimator with the crude raw residuals obtained from Step 1. 

Throughout the numerical studies, following [9], we used cubic splines 
and took the spline dimension K n as K n = [_2n 1 // 5 J. For the efficient estima¬ 
tor, h\ and h ,2 were selected via the commonly used leave-one-subject-out 
cross-validation, and the bandwidth h 3 was set as /13 = 2h\. We report in 
Table 1 the average estimation bias and estimated standard error (SE) ob¬ 
tained from 200 repetitions. The empirical standard errors are very close to 
the estimated standard errors and thus are omitted. In general, the efficient 
estimator could yield smaller estimation bias and variance, compared to the 
naive estimator assuming working independence. In particular, the standard 
error for the efficient estimator is only 20 ~ 50% of that of the working inde¬ 
pendence estimator, indicating a remarkable reduction. In addition, we note 
that the efficient estimator has very similar performance to that of the ora¬ 
cle estimator. Regarding the crude estimator, as it is based on a simplified 
residual construction it produces relatively less accurate covariance estima¬ 
tion. Thus, its estimation bias and standard error are respectively larger 
than that for the efficient estimator. 

There are also other existing methods based on estimating equations. We 
specifically considered the one based on quadratic inference function (QIF) 
[18] in which, to incorporate the longitudinal dependence, the correlation 
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Table 1 

Estimation results of 200 simulations. “Independent” corresponds to Vi = I mi ; 
“Efficient” refers to using Vi = “Oracle” refers to using the true £; as Vi; “Crude” 
refers to using residuals directly from Step 1 to estimate the covariances. 


Independent Efficient Oracle Crude Quadratic 


n 

P 


bias 

SE 

bias 

SE 

bias 

SE 

bias 

SE 

bias 

SE 

100 

0.4 

Pi 

.0214 

.0726 

.0128 

.0366 

.0133 

.0245 

.0165 

.0425 

.0154 

.0421 



Pa 

-.0218 

.0727 

-.0186 

.0362 

-.0146 

.0251 

-.0165 

.0442 

.0102 

.0425 



P 3 

-.0309 

.0718 

-.0126 

.0364 

-.0147 

.0245 

-.0127 

.0435 

.0095 

.0455 



Pi 

.0199 

.0736 

.0145 

.0369 

.0132 

.0246 

.0210 

.0438 

-.0113 

.0398 

200 

0.4 

Pi 

-.0072 

.0525 

-.0082 

.0247 

-.0028 

.0176 

-.0122 

.0337 

.0049 

.0302 



Pa 

.0088 

.0528 

.0136 

.0226 

.0034 

.0174 

.0115 

.0356 

.0089 

.0345 



P3 

-.0071 

.0526 

.0075 

.0256 

.0112 

.0174 

-.0146 

.0354 

-.0076 

.0312 



Pi 

.0094 

.0525 

.0124 

.0272 

.0132 

.0178 

-.0204 

.0355 

-.0075 

.0305 

100 

0.8 

Pi 

.0257 

.0723 

.0245 

.0334 

-.0070 

.0109 

.0347 

.033 

.0112 

.0378 



Pa 

-.0179 

.0731 

-.0122 

.0328 

-.0112 

.0106 

.0436 

.0332 

-.0109 

.0344 



Ps 

.0388 

.0729 

-.0257 

.0335 

.0214 

.0107 

.0279 

.0332 

-.0179 

.0394 



Pi 

-.0193 

.0735 

.0447 

.0334 

-.0122 

.0108 

-.0345 

.0326 

.0184 

.0404 

200 

0.8 

Pi 

.0173 

.0497 

.0149 

.0194 

.0057 

.0089 

.0144 

.0248 

.0089 

.0250 



Pa 

.0169 

.0512 

-.0146 

.0196 

-.0010 

.0092 

-.0167 

.0242 

-.0064 

.0248 



Ps 

-.0364 

.0499 

.0145 

.0190 

.0058 

.0090 

.0135 

.0232 

-.0053 

.0212 



Pi 

.0289 

.0496 

-.0139 

.0182 

-.0035 

.0089 

-.0222 

.0238 

.0083 

.0196 


matrix is approximated using a matrix expansion. We used the same basis 
matrices as recommended by [18], i.e., the first order basis matrix with 0 on 
the diagonal and 1 off-diagonal, which is suitable for unequal cluster sizes 
and irregular time points. Any negative eigenvalue was set to zero whenever 
it occurred. From Table 1, we notice that this approach is more efficient 
than the estimator assuming working independence but is less efficient than 
our proposed method. The QIF approach indirectly models the correlations 
using some matrix approximation while our method directly models the co- 
variances. The actual covariance dependence may differ from the pattern 
suggested by the basis matrices in the quadratic inference function. When 
that happens the estimation results using QIF method may be less satisfac¬ 
tory than our nonparametric approach. Therefore our method may incorpo¬ 
rate a more accurate covariance structure in the estimation and thus achieve 
better efficiency. Besides, the covariance of the estimating function depends 
on the unknown parameters, and is estimated and integrated in the QIF. 
This may decrease the stability in solving the optimization problem. 

We next considered the situation where m; might diverge for some sub¬ 
jects i. We randomly selected no = Cn 3 / 8 subjects such that their observa¬ 
tion points are Bn 1//8 mj equally spaced on [0,1] and we let the ramaining 
n — no subjects to have mi observations, where rrn was generated in the 
same way as described above. All the other model settings are identical to 
that in the previous simulation studies. For different values of B and C, we 
obtained the results given in Table 2. We notice that all the considered es¬ 
timators improve with relatively smaller biases and smaller standard errors 
as compared with the respective bounded rrii case. The efficient estimator 
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still performs much better than the independent estimator in all cases. We 
do not report results for the QIF method by [18] here, as it is not tailored 
for the case of diverging m* and becomes relatively unstable in this case. 

Table 2 

Estimation results of 200 simulations. “Independent” corresponds to Vi = I mi ; 
“Efficient” refers to using V = S;,' “Oracle” refers to using the true E; as Vi. B adjusts 
the diverging mi and C controls the proportion of cases with diverging mi. 


B = 1.5, C = 4 


Independent 

Efficient 

Oracle 

n p 


bias 

SE 

bias 

SE 

bias 

SE 

100 0.4 

0i 

.0182 

.0707 

.0087 

.0361 

-.0017 

.0204 


02 

-.0186 

.0717 

-.0172 

.0329 

-.0055 

.0205 


03 

-.0236 

.0702 

.0041 

.0336 

-.0056 

.0205 


04 

.0100 

.0702 

-.0034 

.0346 

.0008 

.0205 

200 0.4 

01 

-.0130 

.0517 

-.0157 

.0228 

-.0037 

.0153 


02 

.0146 

.0516 

.0177 

.0227 

.0028 

.0151 


03 

-.0151 

.0512 

.0041 

.0224 

.0011 

.0152 


04 

-.0076 

.0517 

.0065 

.0229 

.0038 

.0153 

100 0.8 

01 

.0181 

.0683 

-.0175 

.0213 

.0028 

.0102 


02 

-.0111 

.0682 

-.0147 

.0203 

.0028 

.0102 


03 

-.0030 

.0674 

-.0105 

.0199 

-.0015 

.0100 


04 

.0260 

.0675 

.0125 

.0208 

.0028 

.0101 

200 0.8 

01 

-.0017 

.0499 

-.0024 

.0132 

.0014 

.0076 


02 

-.0005 

.0496 

.0006 

.0129 

-.0001 

.0076 


03 

.0045 

.0499 

.0041 

.0133 

.0004 

.0076 


04 

-.0052 

.0496 

-.0059 

.0130 

-.0009 

.0075 

B = 1.5, C = 4 


Independent 

Efficient 

Oracle 

n p 


bias 

SE 

bias 

SE 

bias 

SE 

100 0.4 

01 

.0105 

.0710 

.0039 

.0315 

-.0026 

.0174 


02 

-.0180 

.0715 

-.0095 

.0313 

-.0046 

.0174 


03 

-.0122 

.0730 

-.0104 

.0323 

.0010 

.0176 


04 

.0141 

.0707 

.0105 

.0317 

.0034 

.0174 

200 0.4 

01 

-.0085 

.0510 

-.0060 

.0223 

-.0036 

.0134 


02 

-.0066 

.0513 

-.0062 

.0225 

-.0018 

.0135 


03 

.0094 

.0510 

-.0015 

.0225 

-.0016 

.0136 


04 

.0062 

.0514 

.0001 

.0224 

.0006 

.0137 

100 0.8 

01 

-.0154 

.0703 

.0042 

.0212 

-.0040 

.0087 


02 

-.0152 

.0690 

.0028 

.0215 

.0001 

.0087 


03 

.0129 

.0677 

.0044 

.0208 

-.0002 

.0092 


04 

-.0076 

.0699 

-.0032 

.0215 

.0008 

.0088 

200 0.8 

01 

-.0141 

.0489 

.0111 

.0157 

-.0001 

.0067 


02 

-.0136 

.0490 

-.0145 

.0147 

-.0003 

.0069 


03 

.0058 

.0491 

.0016 

.0142 

-.0001 

.0069 


04 

.0071 

.0483 

.0041 

.0150 

-.0001 

.0072 


We also conducted additional simulations to examine performance of es¬ 
timation of the nonparametric coefficients and estimation accuracy of para¬ 
metric coefficients using modified approaches. For space consideration, we 
report the results in the supplement [5]. 

4.2. Real data example. We now present an application of our method to 
the CD4 count data from the AIDS Clinical Trial Group 193A Study [11]. 
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The data came from a randomized, double-blind study of AIDS patients 
with CD4 counts of < 50 cells/mm3. The patients were randomized to one 
of four treatments with roughly equal group sizes; each consisted of a daily 
regimen of 600 mg of zidovudine. Treatment 1 is zidovudine alternating 
monthly with 400 mg didanosine; Treatment 2 is zidovudine plus 225 mg of 
zalcitabine; Treatment 3 is zidovudine plus 400 mg of didanosine; Treatment 
4 is a triple therapy consisting of zidovudine plus 400 mg of didanosine plus 
400 mg of nevirapine. Measurements of CD4 counts were scheduled to be 
collected at baseline and at eight week intervals during the 40 weeks of follow¬ 
up. However, the real observation times were unbalanced due to mistimed 
measurements, skipped visits and dropouts. The number of measurements 
of CD4 counts during the 40 weeks of follow-up varied from 1 to 9, with a 
median of 4. The response variable was taken as the log-transformed CD4 
counts, Y =log(CD4 counts + 1). There was also gender and baseline age 
information about each patient. A total of 1309 patients were enrolled in the 
study. We eliminated the 122 patients who dropped out immediately after 
the baseline measurement. 

We considered the following available covariates: treatments 2, 3 and 4 
(coded by three indicator variables for treatment groups 2, 3 and 4, respec¬ 
tively), age (years), sex (coded as 1 for male and 0 for female), and inter¬ 
action effects between these covariates. Using the group SCAD structure 
identification procedure of Cheng et al. (2014), we found that the coeffi¬ 
cients for treatment 3, treatment 4 and the interaction between treatment 
2 and sex are varying, and the coefficients given in Table 3 are constants. 
The group SCAD procedure also suggested that we remove all the other in¬ 
teraction effects. The estimated varying intercept (i.e. effect of treatment 1) 
and the varying coefficients are displayed in Figure 1 along with 95% confi¬ 
dence intervals. The curves in the figures are updated local linear estimates 
without using the covariance function estimates. We used cross-validation 
to select the bandwidth. The constant coefficient estimates and their esti¬ 
mated standard errors are provided in Table 3. To facilitate a comparison, 
we reported the results using the estimators assuming working independence 
and the efficient estimator proposed in this paper. Let 6 = ( f3 T ,'y T ) T and 
Uj = (X ,;. W;). In practice, the variances for the efficient parameter es¬ 
timates were obtained from the first p diagonal elements of the following 

matrix: ( EE i Ul , and for the working independence parame¬ 
ter estimates the variances were obtained from the first p diagonal elements 

of the following matrix: ( E"=i Uj £/;) EEl U-I ( EEl ££*) • 

From Table 3, we note that the estimated constant coefficients for treat- 
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Table 3 

Estimation results for CD) count data. “Independent” corresponds to using V. = I mi ; 
“Efficient” refers to using Vi = £;; “Quadratic” refers to the QIF based method. 


Covariates 

Independent 
Coefficients SE 

Efficient 

Coefficients 

SE 

Quadratic 
Coefficients SE 

treatment 2 

.3614 

.2257 

.4038 

.2027 

.3532 

.1318 

age 

.0946 

.0274 

.0818 

.0245 

.0882 

.0171 

sex 

.1704 

.1768 

.2246 

.1587 

.1187 

.1034 

treatment 3:sex 

-.2922 

.2472 

-.2908 

.2209 

-.2625 

.2485 

treatment 4:sex 

-.5321 

.2416 

-.5653 

.2146 

-.5580 

.1574 



Fig 1. Estimated varying-coefficients along with 95% confidence intervals for the intercept 
(upper left), treatment 3 (upper right), treatment f (lower left), and interaction between 
treatment 2 and sex (lower right). The red curves are efficient estimators while the green 
curves are estimators obtained under working independence. 


ment 2, age, and the interaction between treatment 4 and sex are all quite 
significant. The constant coefficient estimates for sex are not significant but 
are still kept in the model since we include the interactions between treat¬ 
ments and sex. The efficient estimates for all the constant and varying coef¬ 
ficients have smaller standard errors than the respective estimates assuming 
working independence. In fact, the Wald test statistic for the coefficient of 
treatment 2 is .3614/.2257 = 1.60 < 1.96 under the working independence, 
failing to declare a significant difference. On the other hand, the Wald test 
statistic for the same coefficient is .4038/.2027 = 1.99 > 1.96 from the ef¬ 
ficient estimation, leading to a significant treatment difference. Other than 
these, because the sample size in this study was rather large, the two types 
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Fig 2. Estimated treatment effects for the four treatment groups. The panels in the top, 
middle and bottom rows are respectively the proposed efficient estimates, the estimates 
assuming independence and the estimates based on the QIF method. The panels in the 
left and right columns are respectively for the females and the males. Red, green, blue and 
yellow curves are for treatment groups 1, 2, 3 and f, respectively. 


of estimates for all the constant and varying coefficients appear to be very 
similar. For the sake of comparison, we also present the estimation results 
for these regression coefficients from the estimating equation methods based 
on the QIF method [18]. The conclusions on the estimation significance and 
effect direction remain the same as for the efficient estimation while the 
magnitude of the estimated coefficients slightly differs. For this particular 
dataset, sometimes the QIF estimator seems to have smaller standard error 
than the efficient estimator. An explanation is that it choses a covariance 
structure like compound symmetry in the matrix basis, thus it will be more 
efficient than our estimator when this structure is plausible (which is possi¬ 
bly the case here). Otherwise, it is generally not as good when the covariance 
structure is mis-specified. 
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In general, the CD4 count tends to increase with age in the fitted model. 
Our estimation results suggest that there exist interaction effects between 
treatment and sex. Specifically, for the females (sex=0), subjects receiving 
treatments 2, 3 and 4 tend to have increasingly higher CD4 counts than those 
under treatment 1. The effect for treatment 2 (as compared with treatment 
1) is estimated as a constant and is significant, while those for the other two 
treatment groups are varying (the upper right and the lower left panels in 
Figure 1) with even greater positive differences from treatment 1. For the 
males (sex=l), subjects receiving treatments 2, 3 and 4 also tend to have 
higher mean CD4 counts than those receiving treatment 1. The interaction 
between treatment 2 and sex is varying over time (the lower right panel 
in Figure 1) while those for treatments 3 and 4 are constant. The effects 
of treatments 3 and 4 are significantly different from that of treatment 1, 
judging from Table 3. Also, we notice that the differences between treatments 
seem to be greater between the females than between the males. 

The estimated effects of the four treatment groups are plotted in Figure 
2 for the efficient estimator, the working independence estimator and the 
QIF estimator. Note that treatment effects given by the efficient estimator 
rarely cross each other, giving nice interpretation and ordering of the dif¬ 
ferent treatments, whereas this is not the case for those given by the QIF 
or the working independence estimator. Previous authors identified a simi¬ 
lar pattern on the order of magnitude of the time-varying treatment effects 
[14]. However, they ignored the interactions between the treatments and sex. 
Our findings suggest the treatment effect curves might be rather different 
between the males and the females. 

5. Proofs of the main results. 

5.1. Additional assumptions and definitions. We denote the Euclidean 
norm of a vector a by |a|. Let A m i n (A) and A max (A) stand for the minimum 
and maximum eigenvalues of a symmetric matrix A, respectively. Besides, 
C, C\, C 2 , ... are generic positive constants whose values may vary from 
line to line. Recall that the density function of Tij is denoted by by fij(t), 
i = l,...,n and j = 1, • • • ,m;. Also, we denote the joint density func¬ 
tion of and T^r (j / j') by fijj'{s,t). In Assumptions A1 and A2, we 
consider sparse and irregular observation times. Note that we carry out 
two-dimensional smoothing in step 5 and there are three bandwidths in¬ 
volved in our method. Therefore we impose these restrictive assumptions 
to avoid complicated assumptions involving m;, m max , and the bandwidths 
simultaneously. Roughly speaking, these assumptions imply we should have 

EILi ml = 0(n). 
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Assumption Al. For some positive constant Cai, we have 
m ma x = maxi <i< n mi = 0(n 1/8 ) and E”=i m i < Caiu. 

Assumption A2. The joint density functions fij(t ) and fijj'(s,t) are uni¬ 
formly bounded and we have for some positive constant Ca 2 , 


n j mi n mi 

C ' n S — 2 & ^ ^ fa fa < on [0,1], and 
*=i * j=i 


1 1 

< 


i=i j =1 


1 1 


Ca 2 n 


< “ X] X! faj'( s ^) - ~ faj'^fa < C A2 on [0, l] s 


i=1 j¥=j' 


i=i i+y 


Assumption A3. For some positive constants Ca 3 and CA 4 - we have 


CA?>Ip+q < E 


( 2 $ 


x n z l 

^4 



< CA 4 lp+ q , uniformly in i and j. 


Assumption A4. For some positive constants Ca 5 and Cag , we have 
Ca 5 < A min (Sj) < A max (£j) < CA 6 m i, uniformly in i. 

Assumption A5. For some positive constants Ca 7 and Cas , we have 
Ca7 < Amin (Vi) < A max (Vi) < Cas m , uniformly in i. 

Assumption A6. For some positive constants Ca 9 and Caioj we have 
E{exp(C , A 9 |GjD | X^Z^Ti} < C A 10 , uniformly in i and j. 

Assumption A3 is a standard one and is necessary for identification of the 
constant coefficients and the varying coefficient functions. When Cj consists 
of some stochastic process and i.i.d. errors, we have Hi = E(TJ + g 2 I mi , 
where 5(TJ is positive definite. Hence we impose Assumptions A4 and A5 
on Vi and Sj, respectively. In [4], it is assumed that e, has the sub-Gaussian 
property in order to deal with general link functions. The sub-Gaussian 
assumption prevents mi from tending to infinity. Assumption A 6 , which is 
less restrictive, is enough for the identity link function since we do not need 
to employ any results from the empirical process theory in this case. 

For g = (g\,... ,gq) T € G , we define the sup and L 2 norms by ||gr|| Gi oo = 

Ej=i su Pte[o,i] \ 9 jfa\ and \\ 9 \\g ,2 = Ej=i / 0 fajfadt. Assumptions A 2 and 
A3 imply there are positive constants C\ and C 2 such that 


(5-1) Ci||g|| G>2 < \\Z T g\\ v < C 2 \\g\\ Gj2 


for any g € G. The details are given in Lemma 1. In (2.3), we define two 
kinds of projections of Xp. We define another one here: 

<pvk = n VnXk = argmin || X k - Z T g\\^. 


(5.2) 
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5.2. Spline approximation and projections. Recall we assume all the rel¬ 
evant functions are at least twice continuously differentiable and they and 
their second order derivatives are uniformly bounded. Hence the sup norm of 
approximation errors by spline functions is bounded from above by C approx K ~ 2 , 
where C appr ox depends on the relevant functions. See Corollary 6.26 of [19]. 

Note that (-,-) v and || • ||^ are defined on {u | Yli < °°} an d 

that {Z T g} is a closed linear subspace due to (5.1). Therefore the pro¬ 
jections Lpy k = (<fvkv ■ ■ ■ = 1 exist uniquely. Next, we 

set V i ~ 1 = (vj irj ). Note that <fv k = n yX k defined in (2.3) satisfies that 
(Xk — Z T U v X k ,Z T g) 1 =0 \/g € G. By representing the above equal¬ 
ity explicitly, we can derive the following integral equations for ipy k (t). For 
di = 

(5.3) a { £ ] (t)<p* Vkd2 (t) = b^ dl \t) + [ c< £ ] ( s ’ Q'Pvk'h ( s ) ds ’ 

d 2 =1 d.2=1 

where 

n mi 

^(0= EE E {Z ijd2 v t J Z ijdl | Tij = t}fij{t ), 

1=1 3 = 1 

1 n 

b {dl \ t ) = - E E E {X ijlk v? 13a Z ijadl I T in = 

*= 1 l<j'l,j2<>Tli 

1 n 

c d 2 EA) = — “ I = S )Eij 2 = 

*=1 Jl^J2 

Let A(f) be the matrix whose (di,d2)th element is a^\t). Assump¬ 

tions A2 and A3 imply that |A(f)| / 0 on [0,1] and we set V’vfcdi(^) = 

Yld 2 =i a ^d 2 \^) l Pvkd 2 ^')- Then (5.3) reduces to (S.2) of [3] and the same ar¬ 
gument there applies. Therefore <Py k {t) has the required smoothness prop¬ 
erties under similar regularity conditions. 

5.3. Remarks on the proofs of Propositions 1-3. We can proceed as in 
[13] (and [3]) by replacing Zjj. Z_^ and ip k (t) in [13] (and Z t j , Zj, and ip k (t) in 
[3]) with Wij, W , and Z T ify k (t), respectively. They used several lemmas 
in their proofs. We reorganize the corresponding lemmas in our setup into 
Lemma 1 given in the following. Its proof and outlines of the proofs of 
Propositions 1-3 are given in the supplement [5]. 

Lemma 1. Assume that Assumptions Al-5 hold. 
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(i) There are positive constants C\ and C 2 such that for any g £ G, 

Ci\\g\\ G ,2< \\Z T g\\ v <C 2 \\g\\ G ,2- 

(ii) There are positive constants C% and Ca such that for anu g € G B , 
WaWc^oo < C 3 K n \\g\\^ 2 < C A K n (\\Z T g \\ v ) 2 . 

(Hi) There is a positive constant C 5 such that for any (3 € and g € G B , 

\\X T /3+Z T g\\ 00 < C 5 Kn^ \\X T (3+Z T g\\ v , where ||u||oo = maXjj \vij\. 
Besides, for some positive constant Co, HuH^ < C§\\v || 00 - 

(iv) 


sup 

Si,92GGs 


{Z T gi ,Z T g 2 )l-{Z T gi ,Z T g 2 ) v 

\\ZT gi \\y\\ Z Tg 2 \\V 


O p {K n ^\ogn/n). 


(v) For any positive constant M, we have { Xj — Z T gj,X k — Z T g k )n ~ 
(X j -Z T g j ,X k -Z T g k ) 1 = o p (l) uniformly in gj € G B and g k G G B 
satisfying \\gj \\ Gj2 < M and \\g k \\ G ,2 < M. 

(vi) For any process 5 n taking scalar values at Tij such that ||5 n ||oo is uni¬ 
formly bounded in n and are mutually independent in i, 


sup 

9&Gb 


(' 5 n ,Z T g)X-(6 n ,Z T g) v 
\\Z T g\\ v 


O p {^/Kjn)\\5 n ||oo 


(vii) We also suppose Assumption S holds. Then for k = 1 ,... ,p, ||^vfc||oo = 
O p ( 1), \\Z T (<p* Vk -ip Vk )\\X = o p (l), and \\Z T (<p* Vk - fi V k)\\ l = o p ( 1). 


5.4. Proof of Theorem 1. Since we consider the identity link function, 
we have explicit expressions of /3s — /3o and /3g — /3o: 

n 

(5.4) 3s - A) =H U 

2=1 

n 

- H u - W t Hf 2 ] H 2i ) T Vr\w a * - (Z T gp) ) 

2=1 

=h ~ I 2 (say), 


n 

(5.5) 3 £ - /3o =H U ^(X, ; - 
2=1 

n 

H n ^( 1 ; - mHf 2 l H 2i ) T ±r\w n * - (Z T g {) ) : ) 

2=1 

=I\ ~ h (say), 
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where H u , H22 and H21 are defined as in (2.4) with V) = Ej, i = 1 ,. .. , n, 
and 7 * = ( / yf T , • ■ ■, lq T ) T satisfies \B T - g 0j {t)\ < C g K~ 2 ,j = 1 
for some positive constant C g depending on go(t ). Proposition 4 and As¬ 
sumption A4 imply that with probability tending to 1, C\ I mi < Sj < 
uniformly in i for some positive constants C\ and C 2 . As for ST 1 . 

= Et 1 ^ - sosri + - EOST^Ei - S^Et 1 . 


It follows from Proposition 4, Assumption A4, and the above identity that 


(5.6) S ' 1 - ST> 


Si)S7 


-1 


+ mfO p 


hi 


+ hi + 


logn 

nh 2 


+ 


log n\ 
nh\ ) 


The last term in the right-hand side of (5.6) is in the sense of eigenvalue 
evaluation. By using Assumption A4 and Proposition 4, we get an expres¬ 
sion of each element of ST (S, — E,;)S~ . This expression, along with the 
assumptions for Theorem 1 and the local property of the B-spline basis, will 
be employed in the proofs of the following lemmas. These lemmas, assum¬ 
ing the same assumptions as in Theorem 1, are needed in order to evaluate 
I\ — I\ and their proofs are given in the supplement [5]. 


Lemma 2. Let h\2,ki an d ^12,/cZ be the ( k,l ) element of H \2 and H\ 2 , 
respectively. Then we have uniformly in k and l, 


l hl2 ’ kl = 1); l {hl2 ’ kl ~ hl2 ’ kl) = Kn l0 v ^ + ’ 

c A 1/2 

{X>-% a ,*) 2 } = o p (kz 1/2 ), 


\ (h\2 t ki ~ h\2,kl)} 

1=1 



KZ^O, 


hi + hi 


+ 


log n 
nho 


+ 



Lemma 3. With probability tending to 1, CiKff 1 < A m i n (n _ 1 i? 22 ) < 
A m ax(n. - 1 H 22 ) < C 2 KZ 1 for some positive constants C\ and C 2 ■ We also 
have 

max{|A m Un-\H 2 2 ~ H 22 )) |, |A m ^(n~\H 22 - #22))!} 

= K^Opihl + h\ + v / logn/(nh 2 ) + y/logra/(n/i§)). 
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Hence we have max {\\ m i n {n~ 1 H 22 )\, |A max (n -1 i/ 22 ) 1} = O p (K~ l ) and 

max{|A m i n ((n" 1 i/22) _1 -(^ _1 -^22) _1 )UA max ((n" 1 iJ 2 2) _1 -(n _1 i^22) _1 )|} 

is also bounded from above by K n O p (h^+ti^+^log to/( 71 / 12 )+\/l 0 & n / ( n ^I)) • 


Lemma 4. We have ^H n = ±H n +o p (l) and ^H 12 (±H 22 ) ' yPLn = 
^Hi 2 (^H 22 ) ^ H 2 i+o p (1 ), where o p ( 1 ) means both componentwise and in 

the meaning of eigenvalue evaluation. Hence we have nH 11 =nH u + o p (l). 

Lemma 5. We have, for some positive constants C\ and C 2 , j^I q K n < 
cov ( 7 s ELI Wj ^ /n addition we have 



2=1 



/“tT / log n logn log n \ 
V H nh\ nh 2 nh 2 ) 



O p (h\ + h '2 + hff) 


+O p (h 2 + h 2 ) + O p 


11 1 In 

y/nh 2 + ^Jnh\ + \/nK n h 2 + \/nK n h\) 


Lemma 6. We have for some positive constants C\ and C 2 , C\I p < 
cov ^-4= ^_ 1 Xj < C 2 I P . In addition we have 



EiLE 1 




logn 


logn 

nh 2 


+ 


log n\ 
nh 2 / 


+ y/nO p (h\ + /12 + / 13 ) 



Now we prove that Ii — I\ = o p (n 1 / 2 ). Write 

n n 

h = H 11 J2 XiVTh i -H u H 12 H- 1 £ wl sr'e* = ^ n (/n-Ii 2 ) (say). 
2=1 2=1 

We define In and I\ 2 similarly. From Proposition 1 and Lemma 4, we have 
only to prove 






















EFFICIENT ESTIMATION 


27 


The former result in (5.7) can be handled in the same way as the latter and 
we consider only the latter. Write 


^(/12 -/12) = iiTi 2 (-i?22) Wli^T 1 - s r'k 


-h 22 ) 1 

n 7 Jn ^ 

i=i 

+ - ( i 

V 2=1 

7 v n 7 vn 

2=1 


+ ( — H\2 - 

n n 


= DI$ + DI® + Dl[f (say). 

Lemmas 2, 3, and 5 imply 

DI ™ =v ^° p (^Sf + S + Sr) ++ ^+^3) 

+ \/ K n O p ( ; —— H-7——r H— ; ==7 -h 


v \y/nh 2 ' y / nh| ' yJnK n h 2 ' \JnK n K\ - 
+ \/ K n O v {h\ + h|) = o p (l), 

= \/KnO p (h\ + hj + ^\ogn/(nh 2 ) + y^log n/(nh|)^ = o p (l), j = 2,3. 
Hence we have established 
(5.8) I 1 -I 1 = o p {rT 1 / 2 ). 

Next we deal with I 2 — I 2 and two more lemmas are necessary. 

Lemma 7. 

n 

— - ( zT 9o), : ) = O p (y/nK~ 5 / 2 ), and 

vn i =1 

-4 EffiLL -1 - s-'KWiT* - (Zfso),) 

v i=l 

= y/nK~ 5/2 O p (h\ + h\ + \/log nj(nh 2 ) + yJlogn/(nhl) S ) . 
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Lemma 8. 


-LExfS-'Q 

vn i =1 


go).) =Op(y/nK n 2 ) and 


( £ i _1 - ^ rl )0^7* - (^ r go) 

^ 2=1 

= s/nK~ 2 Op(hl + h 2 + Y / logn/(n/i 2 ) + y^og n/(n,v 3 


Now we can show that / 2 — / 2 = o p (n 1 / 2 ). Write 


2=1 

=i/ il (/ 2 i - I 22 ) (say). 


(^ T flo) ,) - H u H 12 H^ Y,WjVr\w n * 

i =1 


(^ r go), ) 


We define / 2 i and / 22 similarly and write / 2 = iT 11 (/ 2 i — J 22 ). From Propo¬ 
sition 1 and Lemma 4, we have only to prove -^=(/ 2 1 — I 2 i) = o p (l) and 

^(/ 22 — ^ 22 ) = o p ( 1). The former result in the above can be handled in the 
same way as the latter and we consider only the latter. Write 



to) = -Hni.-Hv Y'-^Y.wI&T 1 - Sr'KWA - (Z T g „), ) 

V i =1 

V 2 — 1 

+ (-#12 - -H12) (±HvY 1 ±£wTV 7 1 av i Y - (Z T g 0 )). 

\n n / \n / Jn - 1 

2—1 

= + Dig + DI® (say) 


Lemmas 2, 3, and 7 imply, for j = 1,2,3, 

DI 22 ='/riK~ 2 O p (hl + hi + ^log n/{nh 2 ) + 0ogn/(n/i§)) = o p (l). 

Hence we have established / 2 — / 2 = o p (n -1 / 2 ). The desired result follows 
from (5.4), (5.5), (5.8) and the above result. 
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SUPPLEMENTARY MATERIAL 

Supplement A: Additional simulation results and technical ma¬ 
terial 

(doi: xx.xxxx/xx-AOSxxxxSUPP). Additional simulation results, proofs of 
the propositions and lemmas, and theory for the case of uniformly bounded 
cluster size and general link function. 
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